Methods for Diagnosis of Sepsis

ABSTRACT

Methods for diagnosis of sepsis are disclosed. In particular, the invention relates to the use of bio -markers for aiding diagnosis, prognosis, and treatment of sepsis, and to a panel of biomarkers that can be used to distinguish sepsis from noninfectious sources of inflammation, such as caused by traumatic injury, surgery, autoimmune disease, thrombosis, or systemic inflammatory response syndrome (SIRS).

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with Government support under contractsAI057229, AI109662, and LM007033 awarded by the National Institutes ofHealth. The Government has certain rights in the invention.

TECHNICAL FIELD

The present invention pertains generally to methods for diagnosis ofsepsis. In particular, the invention relates to the use of biomarkersfor aiding diagnosis, prognosis, and treatment of sepsis, and morespecifically to biomarkers that can be used to distinguish sepsis fromnoninfectious sources of inflammation, such as caused by traumaticinjury, surgery, autoimmune disease, thrombosis, or systemicinflammatory response syndrome (SIRS).

BACKGROUND

Sepsis, a syndrome of systemic inflammation in response to infection,kills approximately 750,000 people in the United States every year(Angus et al. (2001) Crit Care Med 29: 1303-1310). It is also the singlemost expensive condition treated in the US, costing the healthcaresystem more than $24 billion annually (Lagu et al. (2012) Crit Care Med40:754-761); Torio and Andrews (2013) National Inpatient Hospital Costs:The Most Expensive Conditions by Payer, 2011 (Statistical Brief #160,Agency for Healthcare Research and Quality, Rockville, MD, August 2013).Prompt diagnosis and treatment of sepsis is crucial to reducingmortality, with every hour of delay increasing mortality risk (Gaieskiet al. (2010) Crit Care Med 38:1045-1053; Ferrer et al. (2014) Crit CareMed 42: 1749-1755). Sepsis is defined by the presence of systemicinflammatory response syndrome (SIRS), in addition to a known orsuspected source of infection (Dellinger et al. (2013) Intensive CareMed 39:165-228). However, SIRS is not specific for sepsis, as sterileinflammation can arise as a nonspecific response to trauma, surgery,thrombosis, and other non-infectious insults. Thus, sepsis can bedifficult to distinguish clinically from systemic inflammation caused bynon-infectious sources, such as tissue trauma (Coburn et al. JAMA (2012)308:502-511). There is no ‘gold standard’ blood test for distinguishingpatients with infections at time of diagnosis, before results becomeavailable from standard microbiological cultures. One of the most commonbiomarkers of infection, procalcitonin, has a summary area under thereceiver operating characteristic curve (AUC) of 0.78 (range 0.66-0.90)(Tang et al. (2007) Lancet Infect Dis 7:210-217; Uzzan et al. (2006)Crit Care Med 34:1996-2003; Cheval et al. (2000) Intensive Care Med 26Suppl 2:S153-158; Ugarte et al. (1999) Crit Care Med 27:498-504).Several groups have evaluated whether cytokine or gene expression arrayscan accurately diagnose sepsis; however, due to the highly variablenature of host response and human genetics, no robust diagnosticsignature has been found (Cobb et al. (2009) Ann Surg 250:531-539; Xiaoet al. (2011) J Exp Med 208:2581-2590; Pankla et al. (2009) Genome Biol10:R127; Tang et al. (2009) Crit Care Med 37:882-888; Wong (2012) CritCare 16:204; Johnson et al. (2007) Ann Surg 245:611- 621). Indeed,“finding the ‘perfect’ sepsis marker has been one of the most elusivedreams in modem medicine” (Vega et al. (2009) Crit Care Med37:1806-1807).

Both infections and tissue trauma activate many of the same innateimmune receptor families, such as the Toll-like receptors and NOD-likereceptors, and consequently, activate largely overlappingtranscriptional pathways. Thus, distinguishing conserved downstreameffects attributable solely to infections has been exceedinglydifficult. Recent work has shown that there are pattern recognitionreceptors potentially specific to pathogen response, such as the c-typelectin, CEACAM, and siglec receptor families (Geijtenbeek et al. (2009)Nat Rev Immunol 9:465-479; Crocker (2007) Nat Rev Immunol 7:255-266;Kuespert et al. (2006) Curr Opin Cell Biol 18:565-571). Hence, it may bepossible that an infection-specific immune response could bedifferentiated from sterile inflammation.

The ongoing search for new therapies for sepsis, and for new prognosticand diagnostic biomarkers, has generated several dozen microarray-basedgenome-wide expression studies over the past decade, variously focusingon diagnosis, prognosis, pathogen response, and underlying sepsispathophysiology (Johnson et al., supra; Maslove et al. (2014) Trends MolMed. 20(4):204-213). Despite tremendous gains in the understanding ofgene expression in sepsis, few insights have translated to improvementsin clinical practice. Importantly, many of these studies have beendeposited into public repositories such as the NIH Gene ExpressionOmnibus (GEO) and ArrayExpress, and thus there is now a wealth ofpublically available data on sepsis. In particular, there are severalstudies comparing patients with sepsis to patients with non-infectiousinflammation (such as SIRS) that occurs after major surgery, traumaticinjury, or in non-sepsis-related ICU admission (thrombosis, respiratoryfailure, etc.).

One dataset in particular, the Inflammation and Host Response to InjuryProgram (Glue Grant) (Cobb et al. (2005) Proc Natl Acad Sci USA102:4801-4806), has yielded several important findings about the effectsof time on gene expression after trauma and in sepsis. One part of theGlue Grant longitudinally examined gene expression in patients aftersevere traumatic injuries. Several groups have examined these data withrespect to time; notable findings are that (1) more than 80% ofexpressed genes show differential expression after traumatic injury(Xiao et al., supra), (2) different clusters of genes recover overmarkedly different time periods (Seok et al. (2013) Proc Natl Acad SciUSA 110:3507-3512), (3) differing scenarios of inflammation such astrauma, burns, and endotoxicosis exhibit similar gene expression changes(Seok et al., supra), and (4) the extent to which post-trauma geneexpression profiles differ from those of healthy patients, and theirdegree of gene expression recovery over time, are correlated withclinical outcomes (Desai et al. (2011) PLoS Med 8:e1001093; Warren etal. (2009) Mol Med 15:220-227). There is thus growing understanding ofthe importance of the changes that underlie recovery from trauma, andtheir impact on specific clinical outcomes.

There remains a need for sensitive and specific diagnostic tests forsepsis that can distinguish sepsis from noninfectious sources ofinflammation, such as caused by traumatic injury and SIRS.

SUMMARY

The invention relates to the use of biomarkers for diagnosis of sepsis.In particular, the inventors have discovered biomarkers that can be usedto diagnose sepsis and to distinguish sepsis from noninfectious sourcesof systemic inflammation, such as caused by traumatic injury, surgery,autoimmune disease, thrombosis, or systemic inflammatory responsesyndrome (SIRS). These biomarkers can be used alone or in combinationwith one or more additional biomarkers or relevant clinical parametersin prognosis, diagnosis, or monitoring treatment of sepsis.

Biomarkers that can be used in the practice of the invention includepolynucleotides comprising nucleotide sequences from genes or RNAtranscripts of genes, including but not limited to, CEACAM1, ZDHHC19,C9orf95, GNA15, BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, andHLA-DPB1.

In certain embodiments, a panel of biomarkers is used for diagnosis ofsepsis. Biomarker panels of any size can be used in the practice of theinvention. Biomarker panels for diagnosing sepsis typically comprise atleast 3 biomarkers and up to 30 biomarkers, including any number ofbiomarkers in between, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30biomarkers. In certain embodiments, the invention includes a biomarkerpanel comprising at least 3, at least 4, or at least 5, or at least 6,or at least 7, or at least 8, or at least 9, or at least 10, or at least11 or more biomarkers. Although smaller biomarker panels are usuallymore economical, larger biomarker panels (i.e., greater than 30biomarkers) have the advantage of providing more detailed informationand can also be used in the practice of the invention.

In one embodiment, the biomarker panel comprises a plurality ofbiomarkers for diagnosing sepsis, wherein the plurality of biomarkerscomprises one or more polynucleotides comprising a nucleotide sequencefrom a gene or an RNA transcript of a gene selected from the groupconsisting of CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, C3AR1, KIAA1370,TGFBI, MTCH1, RPGRIP1, and HLA-DPB1. In certain embodiments, thebiomarker panel comprises at least 11 biomarkers. In one embodiment thebiomarker panel comprises a CEACAM1 polynucleotide, a ZDHHC19polynucleotide, a C9orf95 polynucleotide, a GNA15 polynucleotide, a BATFpolynucleotide, a C3AR1 polynucleotide, a KIAA1370 polynucleotide, aTGFBI polynucleotide, a MTCH1 polynucleotide, a RPGRIP1 polynucleotide,and a HLA-DPB1 polynucleotide.

In one aspect, the invention includes a method for diagnosing sepsis ina subject. The method comprises a) measuring the level of a plurality ofbiomarkers in a biological sample derived from the subject; and b)analyzing the levels of the biomarkers in conjunction with respectivereference value ranges for the plurality of biomarkers, whereindifferential expression of one or more biomarkers in the biologicalsample compared to reference value ranges of the biomarkers for anon-infected control subject indicate that the subject has sepsis. Thereference value ranges can represent the levels of one or morebiomarkers found in one or more samples of one or more subjects withoutsepsis (e.g., healthy subject or non-infected subject). Alternatively,the reference values can represent the levels of one or more biomarkersfound in one or more samples of one or more subjects with sepsis. Incertain embodiments, the levels of the biomarkers are compared totime-matched reference values ranges for non-infected or infected/septicsubjects.

In certain embodiments, the invention includes a method for diagnosingsepsis in a subject using a biomarker panel described herein. The methodcomprises: a) collecting a biological sample from the subject; b)measuring each biomarker of the biomarker panel in the biologicalsample; and c) comparing the measured values of each biomarker withrespective reference value ranges for the biomarkers, whereindifferential expression of the biomarkers of the biomarker panel in thebiological sample compared to reference values of the biomarkers for acontrol subject indicate that the subject has sepsis.

In one embodiment, the invention includes a method for diagnosing sepsisin a subject, the method comprising: a) collecting a biological samplefrom the subject; b) measuring levels of expression of CEACAM1, ZDHHC19,C9orf95, GNA15, BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, andHLA-DPB1 biomarkers in the biological sample; and c) analyzing thelevels of expression of each biomarker in conjunction with respectivereference value ranges for the biomarkers, wherein increased levels ofexpression of the CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, and C3AR1biomarkers and decreased levels of expression of the KIAA1370, TGFBI,MTCH1, RPGRIP1, and HLA-DPB1 biomarkers compared to the reference valueranges for the biomarkers for a non-infected control subject indicatethat the subject has sepsis.

In another embodiment, the invention includes a method for diagnosingsepsis in a subject comprising determining a sepsis score for thesubject based on the levels of the biomarkers according to the followingformula:

$\begin{array}{l}{\sqrt[6]{\left( {CEACAM1 \ast ZDHHC19 \ast C9orf95 \ast GNA15 \ast BATF \ast C3AR1} \right)} -} \\{\frac{5}{6}\sqrt[5]{\left( {KIAA1370 \ast TGFBI \ast MTCH1 \ast RPGRIP1 \ast HLA - DPB1} \right)},}\end{array}$

wherein a higher sepsis score for the subject compared to referencevalue ranges for a non-infected control subject indicates that thesubject has sepsis.

Methods of the invention, as described herein, can be used todistinguish a diagnosis of sepsis for a subject from noninfectioussources of inflammation, such as caused by traumatic injury, surgery,autoimmune disease, thrombosis, or systemic inflammatory responsesyndrome (SIRS).

The biological sample may comprise, for example, whole blood, buffycoat, plasma, serum, peripheral blood mononucleated cells (PBMCS), bandcells, neutrophils, monocytes, or T cells.

Biomarker polynucleotides (e.g., coding transcripts) can be detected,for example, by microarray analysis, polymerase chain reaction (PCR),reverse transcriptase polymerase chain reaction (RT-PCR), Northern blot,or serial analysis of gene expression (SAGE).

In another aspect, the invention includes a method of determining aninfection Z-score for a subject suspected of having sepsis, the methodcomprising: a) collecting a biological sample from the subject; b)measuring the levels of a plurality of sepsis biomarkers in thebiological sample; and c) determining the infection Z-score for thebiomarkers by subtracting the geometric mean of the expression levels ofall biomarkers that are underexpressed compared to control referencevalues for the biomarkers from the geometric mean of the expressionlevels of all biomarkers that are overexpressed compared to controlreference values for the biomarkers, and multiplying the difference bythe ratio of the number of biomarkers that are overexpressed to thenumber of biomarkers that are underexpressed compared to controlreference values for the biomarkers.

In certain embodiments, the infection Z-score is calculated from theexpression levels of a plurality of biomarkers comprising one or morepolynucleotides comprising a nucleotide sequence from a gene or an RNAtranscript of a gene selected from the group consisting of CEACAM1,ZDHHC19, C9orf95, GNA15, BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1,and HLA-DPB1. In one embodiment, the plurality of biomarkers comprises aCEACAM1 polynucleotide, a ZDHHC19 polynucleotide, a C9orf95polynucleotide, a GNA15 polynucleotide, a BATF polynucleotide, a C3AR1polynucleotide, a KIAA1370 polynucleotide, a TGFBI polynucleotide, aMTCH1 polynucleotide, a RPGRIP1 polynucleotide, and a HLA-DPB 1polynucleotide.

In another aspect, the invention includes a method of treating a subjecthaving sepsis, the method comprising: a) diagnosing the subject withsepsis according to a method described herein; and b) administering atherapeutically effective amount of broad spectrum antibiotics to thesubject if the subject has a positive sepsis diagnosis.

In another aspect, the invention includes a method of treating a subjectsuspected of having sepsis, the method comprising: a) receivinginformation regarding the diagnosis of the subject according to a methoddescribed herein; and b) administering a therapeutically effectiveamount of broad spectrum antibiotics to the subject if the patient has apositive sepsis diagnosis.

In certain embodiments, subject data is analyzed by one or more methodsincluding, but not limited to, multivariate linear discriminant analysis(LDA), receiver operating characteristic (ROC) analysis, principalcomponent analysis (PCA), ensemble data mining methods, cell specificsignificance analysis of microarrays (csSAM), and multi-dimensionalprotein identification technology (MUDPIT) analysis.

In another embodiment, the invention includes a method for evaluatingthe effect of an agent for treating sepsis in a subject using abiomarker panel described herein, the method comprising: analyzing themeasured value of each biomarker of the biomarker panel in samplesderived from the subject before and after the subject is treated withthe agent in conjunction with respective reference value ranges for eachbiomarker.

In another embodiment, the invention includes a method for monitoringthe efficacy of a therapy for treating sepsis in a subject using abiomarker panel described herein, the method comprising: analyzing themeasured value of each biomarker of the biomarker panel in samplesderived from the subject before and after the subject undergoes saidtherapy, in conjunction with respective reference value ranges for eachbiomarker.

In another embodiment, the invention includes a method for monitoringthe efficacy of a therapy for treating sepsis in a subject, the methodcomprising: measuring levels of expression of CEACAM1, ZDHHC19, C9orf95,GNA15, BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, and HLA-DPB1biomarkers in a first sample derived from the subject before the subjectundergoes the therapy and a second sample derived from the subject afterthe subject undergoes the therapy, wherein increased levels ofexpression of the CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, and C3AR1biomarkers and decreased levels of expression of the KIAA1370, TGFBI,MTCH1, RPGRIP1, and HLA-DPB1 biomarkers in the second sample compared tothe levels of expression of the biomarkers in the first sample indicatethat the subject is worsening, and decreased levels of expression of theCEACAM1, ZDHHC19, C9orf95, GNA15, BATF, and C3AR1 biomarkers andincreased levels of expression of the KIAA1370, TGFBI, MTCH1, RPGRIP1,and HLA-DPB1 biomarkers in the second sample compared to the levels ofexpression of the biomarkers in the first sample indicate that thesubject is improving.

In another aspect, the invention includes a kit for diagnosing sepsis ina subject. The kit may include a container for holding a biologicalsample isolated from a human subject suspected of having sepsis, atleast one agent that specifically detects a sepsis biomarker; andprinted instructions for reacting the agent with the biological sampleor a portion of the biological sample to detect the presence or amountof at least one sepsis biomarker in the biological sample. The agentsmay be packaged in separate containers. The kit may further comprise oneor more control reference samples and reagents for performing PCR ormicroarray analysis for detection of biomarkers as described herein.

In certain embodiments, the kit includes agents for detectingpolynucleotides of a biomarker panel comprising a plurality ofbiomarkers for diagnosing sepsis, wherein one or more biomarkers areselected from the group consisting of a CEACAM1 polynucleotide, aZDHHC19 polynucleotide, a C9orf95 polynucleotide, a GNA15polynucleotide, a BATF polynucleotide, a C3AR1 polynucleotide, aKIAA1370 polynucleotide, a TGFBI polynucleotide, a MTCH1 polynucleotide,a RPGRIP1 polynucleotide, and a HLA-DPB1 polynucleotide. In oneembodiment, the kit includes agents for detecting biomarkers of abiomarker panel comprising a CEACAM1 polynucleotide, a ZDHHC19polynucleotide, a C9orf95 polynucleotide, a GNA15 polynucleotide, a BATFpolynucleotide, a C3AR1 polynucleotide, a KIAA1370 polynucleotide, aTGFBI polynucleotide, a MTCH1 polynucleotide, a RPGRIP1 polynucleotide,and a HLA-DPB1 polynucleotide. Furthermore, the kit may include agentsfor detecting more than one biomarker panel, such as two or threebiomarker panels, which can be used alone or together in anycombination, and/or in combination with clinical parameters fordiagnosis of sepsis.

In certain embodiments, the kit comprises a microarray for analysis of aplurality of biomarker polynucleotides. In one embodiment, the kitcomprises a microarray comprising an oligonucleotide that hybridizes toa CEACAM1 polynucleotide, an oligonucleotide that hybridizes to aZDHHC19 polynucleotide, an oligonucleotide that hybridizes to a C9orf95polynucleotide, an oligonucleotide that hybridizes to a GNA15polynucleotide, an oligonucleotide that hybridizes to a BATFpolynucleotide, an oligonucleotide that hybridizes to a C3AR1polynucleotide, an oligonucleotide that hybridizes to a KIAA1370polynucleotide, an oligonucleotide that hybridizes to a TGFBIpolynucleotide, an oligonucleotide that hybridizes to a MTCH1polynucleotide, an oligonucleotide that hybridizes to a RPGRIP1polynucleotide, and an oligonucleotide that hybridizes to a HLA-DPB 1polynucleotide.

In another aspect, the invention includes an assay comprising: a)measuring at least one biomarker in a biological sample collected from asubject suspected of having sepsis; and b) comparing the measured valueof the at least one biomarker in the biological sample with referencevalues for the biomarker for a control subject, wherein differentialexpression of the at least one biomarker in the biological samplecompared to the reference values indicate that the subject has sepsis.The biological sample may comprise, for example, whole blood, buffycoat, plasma, serum, peripheral blood mononucleated cells (PBMCS), bandcells, neutrophils, monocytes, or T cells. In one embodiment, the assayfurther comprises determining an infection Z-score for the subject.

In one embodiment, the invention includes an assay comprising: a)measuring each biomarker of a biomarker panel, described herein, in abiological sample collected from a subject suspected of having sepsis;and b) comparing the measured value of each biomarker of the biomarkerpanel in the biological sample with reference values for each biomarkerfor a control subject, wherein differential expression of the biomarkersin the biological sample compared to the reference values indicate thatthe subject has sepsis. The biological sample may comprise, for example,whole blood, buffy coat, plasma, serum, peripheral blood mononucleatedcells (PBMCS), band cells, neutrophils, monocytes, or T cells. The assaymay further comprise determining an infection Z-score for the subject.

In other embodiments, measuring at least one biomarker comprisesperforming microarray analysis, polymerase chain reaction (PCR), reversetranscriptase polymerase chain reaction (RT-PCR), a Northern blot, or aserial analysis of gene expression (SAGE). In one embodiment, microarrayanalysis is performed with a microarray comprising an oligonucleotidethat hybridizes to a CEACAM1 polynucleotide, an oligonucleotide thathybridizes to a ZDHHC19 polynucleotide, an oligonucleotide thathybridizes to a C9orf95 polynucleotide, an oligonucleotide thathybridizes to a GNA15 polynucleotide, an oligonucleotide that hybridizesto a BATF polynucleotide, an oligonucleotide that hybridizes to a C3AR1polynucleotide, an oligonucleotide that hybridizes to a KIAA1370polynucleotide, an oligonucleotide that hybridizes to a TGFBIpolynucleotide, an oligonucleotide that hybridizes to a MTCH1polynucleotide, an oligonucleotide that hybridizes to a RPGRIP1polynucleotide, and an oligonucleotide that hybridizes to a HLA-DPB 1polynucleotide.

In another aspect, the invention includes a diagnostic system comprisinga storage component (i.e., memory) for storing data, wherein the storagecomponent has instructions for determining the diagnosis of the subjectstored therein; a computer processor for processing data, wherein thecomputer processor is coupled to the storage component and configured toexecute the instructions stored in the storage component in order toreceive patient data and analyze patient data according to an algorithm;and a display component for displaying information regarding thediagnosis of the patient. The storage component may include instructionsfor calculating an infection Z-score or sepsis score, as describedherein (see Examples 1 and 2). Additionally, the storage component mayfurther include instructions for performing multivariate lineardiscriminant analysis (LDA), receiver operating characteristic (ROC)analysis, principal component analysis (PCA), ensemble data miningmethods, cell specific significance analysis of microarrays (csSAM), ormulti-dimensional protein identification technology (MUDPIT) analysis.

In certain embodiments, the invention includes a computer implementedmethod for diagnosing a patient suspected of having sepsis, the computerperforming steps comprising: a) receiving inputted patient datacomprising values for the level of a plurality of sepsis biomarkers in abiological sample from the patient; b) analyzing the level of aplurality of sepsis biomarkers and comparing with respective referencevalue ranges for the sepsis biomarkers; c) calculating an infectionZ-score or sepsis score for the patient based on the levels of thesepsis biomarkers; d) calculating the likelihood that the patient hassepsis based on the value of the infection Z-score; and e) displayinginformation regarding the diagnosis of the patient.

In certain embodiments, the inputted patient data comprises values forthe levels of at least 11 sepsis biomarkers in a biological sample fromthe patient. For example, the inputted patient data may comprises valuesfor the levels of a CEACAM1 polynucleotide, a ZDHHC19 polynucleotide, aC9orf95 polynucleotide, a GNA15 polynucleotide, a BATF polynucleotide, aC3AR1 polynucleotide, a KIAA1370 polynucleotide, a TGFBI polynucleotide,a MTCH1 polynucleotide, a RPGRIP1 polynucleotide, and a HLA-DPB1polynucleotide.

In another embodiment, the invention includes a computer implementedmethod for diagnosing a patient suspected of having sepsis, the computerperforming steps comprising: a) receiving inputted patient datacomprising values for levels of expression of CEACAM1, ZDHHC19, C9orf95,GNA15, BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, and HLA-DPB1biomarkers in a biological sample from the patient; b) analyzing thelevel of each biomarker and comparing with respective reference valueranges for each biomarker; c) calculating a sepsis score for the patientbased on the levels of expression of the biomarkers according to thefollowing formula:

$\begin{array}{l}{\sqrt[6]{\left( {CEACAM1 \ast ZDHHC19 \ast C9orf95 \ast GNA15 \ast BATF \ast C3AR1} \right)} -} \\{\frac{5}{6}\sqrt[5]{\left( {KIAA1370 \ast TGFBI \ast MTCH1 \ast RPGRIP1 \ast HLA - DPB1} \right)};}\end{array}$

d) calculating the likelihood that the patient has sepsis based on thevalue of the sepsis score, wherein a higher sepsis score for the patientcompared to reference value ranges for a non-infected control subjectindicates that the patient has sepsis; and e) displaying informationregarding the diagnosis of the patient.

These and other embodiments of the subject invention will readily occurto those of skill in the art in view of the disclosure herein.

BRIEF DESCRIPTION OF THE FIGURES

FIGS. 1A and 1B show a labelled principal components analysis (PCA)comparing sterile SIRS/trauma patients versus sepsis patients. FIG. 1Ashows that sterile SIRS/trauma and sepsis patients appear to be largelyseparable in the transcriptomic space, with only a minimal non-separableset. FIG. 1B shows the same labelled PCA with labels updated to reflectpatients in recovery from non-infectious SIRS/trauma, and patients withhospital-acquired sepsis; the ‘late’ group (>48 hours after hospitaladmission) is much harder to separate. N = 1094 combined from 15studies.

FIGS. 2A-2D show effects of size of the 11 gene set. Forest plots areshown for random effects model estimates of effect size of the positivegenes, comparing SIRS/trauma/ICU to infection/sepsis patients, in eachof the discovery cohorts.

FIGS. 3A-3F show results of the 11-gene set in the discovery andneutrophils validation datasets. FIG. 3A shows ROC curves separatingsterile SIRS/ICU/trauma patients from those with sepsis in the discoverydatasets. FIG. 3B shows ROC curves separating trauma patients withinfections from time-matched trauma patients without infection in theGlue Grant neutrophil validation datasets. Glue Grant (FIG. 3C) buffycoat discovery and (FIG. 3D) neutrophil validation samples, after >1 daysince injury, show average infection Z-scores for non-infected patientsversus patients within +/-24 hours of diagnosis. In both cases there isa significant effect due to both time and infection status. Boxplots ofinfection Z-score by time since injury are shown for (FIG. 3E) buffycoat discovery and (FIG. 3F) neutrophil validation samples: patientsnever infected are compared to patients >5 days prior to infection,5-to-1 days prior to infection, +/- 1 day of diagnosis (cases), and2-to-5 days after infection diagnosis. JT trend test was significant(p<0.01) for an increasing trend from never infected to +/- 1 day ofinfection for each time point after admission.

FIGS. 4A-4D show no-controls datasets of trauma/ICU patients thatdevelop VAP. These datasets did not include non-infected patients, sothey were empiric-Bayes co-normalized with time-matched Glue Grantpatients. The gray line shows the Glue Grant loess curve for (FIG. 4A)EMEXP3001, (FIG. 4B) GSE6377, and (FIG. 4C) GSE12838 neutrophil andwhole blood samples. In all cases, only the first 8 days since admissionare shown, and patients are censored >1 day after diagnosis ofinfection. FIG. 4D shows ROC curves comparing patients within +/-1 dayof diagnosis (dark gray points in FIGS. 4A-4C) with time-matchednon-infected Glue Grant patients. See Table 5 for further datasetdetails.

FIGS. 5A and 5B show discrimination of healthy versus sepsis patients.Eight independent validation datasets that met inclusion criteria(peripheral whole blood or neutrophils, sampled within 48 hours ofsepsis diagnosis) were tested with the infection Z-score. FIG. 5A showsinfection Z-scores for all n = 446 patients, which were combined in asingle violin plot; error bars show middle quartiles. P-valuescalculated with Wilcoxon rank-sum test. FIG. 5B shows separate ROCcurves for each of the 8 datasets discriminating sepsis patients fromhealthy controls. Mean ROC AUC = 0.98. See Table 6 for further datasetdetails.

FIGS. 6A and 6B show cell-type enrichment analyses. Shown arestandardized enrichment scores (Z-scores, black dots) for human immunecell types for both (FIG. 6A) the entire set of 82 genes found to besignificant in multi-cohort analysis, and (FIG. 6B) the 11-gene setfound after forward search (subset of the 82 genes). FIG. 6B also showsa boxplot of the distributions of Z-scores.

FIGS. 7A and 7B show labelled PCA comparing healthy patients versusSIRS/Trauma patients versus sepsis patients. FIG. 7A shows that healthypatients, SIRS/Trauma patients and sepsis patients appear to be largelyseparable in the transcriptomic space, with only a minimal non-separableset. FIG. 7B shows the same labelled PCA with labels updated to reflectpatients in recovery from non-infectious SIRS/Trauma, and patients withhospital-acquired sepsis; the ‘late’ group (>48 hours after hospitaladmission) is much harder to separate. N = 1316 combined from 15studies.

FIG. 8 shows the neutrophil percentage for the Glue Grant patients withboth complete blood count and microarray data. Median neutrophilpercentage is between 75-85% for all time points. Patients who were everinfected during their hospital stay are compared to patients neverinfected during their hospital stay.

FIGS. 9A-9I show violin plots for the datasets that were included in thediscovery multi-cohort analysis, including GPSSSI Unique (FIG. 9A),GSE28750 (FIG. 9B), GSE32707 (FIG. 9C), and GSE40012 (FIG. 9D). Shownare the datasets comparing SIRS/ICU/trauma to sepsis patients atadmission. Error bars show middle quartiles. P-values are computed usingWilcoxon rank-sum test. FIGS. 9E-9I show violin plots for the datasetsincluded in the discovery multi-cohort analysis for Glue Grant BuffyCoat cohorts, comparing non-infected trauma patients to sepsis patientsat matched time points, including [1,3) (FIG. 9E), [3,6) (FIG. 9F),[6,10) (FIG. 9G), [10,18) (FIG. 9H), and [18,24) (FIG. 9I). Error barsshow middle quartiles. P-values are computed using Wilcoxon rank-sumtest.

FIGS. 10A and 10B show performance of the infection Z-score in thesorted monocytes from the Glue Grant cohort. These are the same patientsas the neutrophils validation cohort in FIGS. 3B, 3D, and 3F. FIG. 10Ashows ROC curves for each of the four sampled time bins. FIG. 10B showsboxplots of infection Z-score by time since injury. Patients neverinfected are compared to patients >5 days prior to infection, 5-to-1days prior to infection, within +/- 1 day of diagnosis (cases), and2-to-5 days post infection.

FIGS. 11A and 11B show performance of the infection Z-score in thesorted T-cells from the Glue Grant cohort. These are the same patientsas the neutrophils validation cohort in FIGS. 3B, 3D, and 3F. FIG. 11Ashows ROC curves for each of the four sampled time bins. FIG. 11B showsboxplots of infection Z-score by time since injury. Patients neverinfected are compared to patients >5 days prior to infection, 5-to-1days prior to infection, within +/- 1 day of diagnosis (cases), and2-to-5 days post infection.

FIGS. 12A and 12B show linear models of SIRS criteria and the infectionZ-score. FIG. 12A shows logistic regression models for Glue Grantpatients with both SIRS data and microarray data available. SIRScriteria are represented as binary variables. The first model shows SIRScriteria in combination; the second model adds the infection Z-score.Significance codes: p < 0.001 ‘***’; 0.01 ‘**’; 0.05 ‘*’. FIG. 12B showsboxplots of predicted log odds of infection for patients as output bythe logistic regression models in FIG. 12A.

FIG. 13A shows the infection Z-score in non-time-matched datasets. Fourdatasets compared SIRS/ICU/trauma patients to sepsis patients atnon-matched time points. These datasets tested neutrophils (GSE5772,N=93), whole blood (EMTAB1548, N=73), and PBMCs (GSE9960, N=30;EMEXP3621, N=10). See Table 7 for further dataset details. FIGS. 13B-13Eshow infection Z-scores in non-time-matched datasets. Violin plots areshown for the non-matched time-point datasets including GSE5772 (FIG.13B), GSE9960 (FIG. 13C), EMTAB1548 (FIG. 13D), and EMEXP3621 (FIG.13E). Error bars show middle quartiles. Tested with Wilcoxon rank-sumtest.

FIGS. 14A and 14B show a comparison of the infection Z-scores inpatients with acute infections to healthy controls and patients withautoimmune diseases. GSE22098 compares healthy controls to patients withacute autoimmune inflammation or acute infections. The infection Z-scoreshows good discrimination of infection from both healthy patients andthose with autoimmune inflammation. FIG. 14A shows violin plots; errorbars show middle quartiles. Patients with autoimmune inflammation vs.those with sepsis tested with Wilcoxon rank-sum test. FIG. 14B shows aROC plot of autoimmune patients or healthy controls versus septicpatients.

FIG. 15 shows a schematic of the entire integrated multi-cohortanalysis.

FIG. 16 shows a schematic diagram of a diagnostic system.

FIG. 17 shows schema for systematic search and selection of clinicalsepsis datasets.

FIGS. 18A-18C show ROC plot discrimination of sepsis/acute infectionsfrom patients with non-infectious inflammation at admission. FIG. 18Ashows the 11-gene score. FIG. 18B shows the FAIM3:PLAC8 ratio. FIG. 18Cshows the Septicyte Lab.

FIGS. 19A-19C show ROC plot discrimination of trauma patients withsepsis/acute infections from time-matched never-infected traumapatients. FIG. 19A shows the 11-gene score. FIG. 19B shows theFAIM3:PLAC8 ratio. FIG. 19C shows the Septicyte Lab.

DETAILED DESCRIPTION

The practice of the present invention will employ, unless otherwiseindicated, conventional methods of pharmacology, chemistry,biochemistry, recombinant DNA techniques and immunology, within theskill of the art. Such techniques are explained fully in the literature.See, e.g., J.R. Brown Sepsis: Symptoms, Diagnosis and Treatment (PublicHealth in the 21st Century Series, Nova Science Publishers, Inc., 2013);Sepsis and Non-infectious Systemic Inflammation: From Biology toCritical Care (J. Cavaillon, C. Adrie eds., Wiley-Blackwell, 2008);Sepsis: Diagnosis, Management and Health Outcomes (Allergies andInfectious Diseases, N. Khardori ed., Nova Science Pub Inc., 2014);Handbook of Experimental Immunology, Vols. I-IV (D.M. Weir and C.C.Blackwell eds., Blackwell Scientific Publications); A.L. Lehninger,Biochemistry (Worth Publishers, Inc., current addition); Sambrook, etal., Molecular Cloning: A Laboratory Manual (3rd Edition, 2001); MethodsIn Enzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.).

All publications, patents and patent applications cited herein, whethersupra or infra, are hereby incorporated by reference in theirentireties.

I. Definitions

In describing the present invention, the following terms will beemployed, and are intended to be defined as indicated below.

It must be noted that, as used in this specification and the appendedclaims, the singular forms “a,” “an,” and “the” include plural referentsunless the content clearly dictates otherwise. Thus, for example,reference to “a biomarker” includes a mixture of two or more biomarkers,and the like.

The term “about,” particularly in reference to a given quantity, ismeant to encompass deviations of plus or minus five percent.

A “biomarker” in the context of the present invention refers to abiological compound, such as a polynucleotide which is differentiallyexpressed in a sample taken from patients having sepsis as compared to acomparable sample taken from control subjects (e.g., a person with anegative diagnosis, normal or healthy subject, or non-infected subject).The biomarker can be a nucleic acid, a fragment of a nucleic acid, apolynucleotide, or an oligonucleotide that can be detected and/orquantified. Sepsis biomarkers include polynucleotides comprisingnucleotide sequences from genes or RNA transcripts of genes, includingbut not limited to, CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, C3AR1,KIAA1370, TGFBI, MTCH1, RPGRIP 1, and HLA-DPB1.

The terms “polypeptide” and “protein” refer to a polymer of amino acidresidues and are not limited to a minimum length. Thus, peptides,oligopeptides, dimers, multimers, and the like, are included within thedefinition. Both full-length proteins and fragments thereof areencompassed by the definition. The terms also include postexpressionmodifications of the polypeptide, for example, glycosylation,acetylation, phosphorylation, hydroxylation, oxidation, and the like.

The terms “polynucleotide,” “oligonucleotide,” “nucleic acid” and“nucleic acid molecule” are used herein to include a polymeric form ofnucleotides of any length, either ribonucleotides ordeoxyribonucleotides. This term refers only to the primary structure ofthe molecule. Thus, the term includes triple-, double- andsingle-stranded DNA, as well as triple-, double- and single-strandedRNA. It also includes modifications, such as by methylation and/or bycapping, and unmodified forms of the polynucleotide. More particularly,the terms “polynucleotide,” “oligonucleotide,” “nucleic acid” and“nucleic acid molecule” include polydeoxyribonucleotides (containing2-deoxy-D-ribose), polyribonucleotides (containing D-ribose), and anyother type of polynucleotide which is an N— or C-glycoside of a purineor pyrimidine base. There is no intended distinction in length betweenthe terms “polynucleotide,” “oligonucleotide,” “nucleic acid” and“nucleic acid molecule,” and these terms are used interchangeably.

The phrase “differentially expressed” refers to differences in thequantity and/or the frequency of a biomarker present in a sample takenfrom patients having, for example, sepsis as compared to a controlsubject or non-infected subject. For example, a biomarker can be apolynucleotide which is present at an elevated level or at a decreasedlevel in samples of patients with sepsis compared to samples of controlsubjects. Alternatively, a biomarker can be a polynucleotide which isdetected at a higher frequency or at a lower frequency in samples ofpatients with sepsis compared to samples of control subjects. Abiomarker can be differentially present in terms of quantity, frequencyor both.

A polynucleotide is differentially expressed between two samples if theamount of the polynucleotide in one sample is statisticallysignificantly different from the amount of the polynucleotide in theother sample. For example, a polynucleotide is differentially expressedin two samples if it is present at least about 120%, at least about130%, at least about 150%, at least about 180%, at least about 200%, atleast about 300%, at least about 500%, at least about 700%, at leastabout 900%, or at least about 1000% greater than it is present in theother sample, or if it is detectable in one sample and not detectable inthe other.

Alternatively or additionally, a polynucleotide is differentiallyexpressed in two sets of samples if the frequency of detecting thepolynucleotide in samples of patients’ suffering from sepsis, isstatistically significantly higher or lower than in the control samples.For example, a polynucleotide is differentially expressed in two sets ofsamples if it is detected at least about 120%, at least about 130%, atleast about 150%, at least about 180%, at least about 200%, at leastabout 300%, at least about 500%, at least about 700%, at least about900%, or at least about 1000% more frequently or less frequentlyobserved in one set of samples than the other set of samples.

A “similarity value” is a number that represents the degree ofsimilarity between two things being compared. For example, a similarityvalue may be a number that indicates the overall similarity between apatient’s expression profile using specific phenotype-related biomarkersand reference value ranges for the biomarkers in one or more controlsamples or a reference expression profile (e.g., the similarity to a“sepsis” expression profile or a “sterile inflammation” expressionprofile). The similarity value may be expressed as a similarity metric,such as a correlation coefficient, or may simply be expressed as theexpression level difference, or the aggregate of the expression leveldifferences, between levels of biomarkers in a patient sample and acontrol sample or reference expression profile.

The terms “subject,” “individual,” and “patient,” are usedinterchangeably herein and refer to any mammalian subject for whomdiagnosis, prognosis, treatment, or therapy is desired, particularlyhumans. Other subjects may include cattle, dogs, cats, guinea pigs,rabbits, rats, mice, horses, and so on. In some cases, the methods ofthe invention find use in experimental animals, in veterinaryapplication, and in the development of animal models for disease,including, but not limited to, rodents including mice, rats, andhamsters; and primates.

As used herein, a “biological sample” refers to a sample of tissue,cells, or fluid isolated from a subject, including but not limited to,for example, blood, buffy coat, plasma, serum, blood cells (e.g.,peripheral blood mononucleated cells (PBMCS), band cells, neutrophils,monocytes, or T cells), fecal matter, urine, bone marrow, bile, spinalfluid, lymph fluid, samples of the skin, external secretions of theskin, respiratory, intestinal, and genitourinary tracts, tears, saliva,milk, organs, biopsies and also samples of in vitro cell cultureconstituents, including, but not limited to, conditioned media resultingfrom the growth of cells and tissues in culture medium, e.g.,recombinant cells, and cell components.

A “test amount” of a biomarker refers to an amount of a biomarkerpresent in a sample being tested. A test amount can be either anabsolute amount (e.g., µg/ml) or a relative amount (e.g., relativeintensity of signals).

A “diagnostic amount” of a biomarker refers to an amount of a biomarkerin a subject’s sample that is consistent with a diagnosis of sepsis. Adiagnostic amount can be either an absolute amount (e.g., µg/ml) or arelative amount (e.g., relative intensity of signals).

A “control amount” of a biomarker can be any amount or a range of amountwhich is to be compared against a test amount of a biomarker. Forexample, a control amount of a biomarker can be the amount of abiomarker in a person without sepsis. A control amount can be either inabsolute amount (e.g., µg/ml) or a relative amount (e.g., relativeintensity of signals).

The term “antibody” encompasses polyclonal and monoclonal antibodypreparations, as well as preparations including hybrid antibodies,altered antibodies, chimeric antibodies and, humanized antibodies, aswell as: hybrid (chimeric) antibody molecules (see, for example, Winteret al. (1991) Nature 349:293-299; and U.S. Pat. No. 4,816,567); F(ab′)₂and F(ab) fragments; F_(v) molecules (noncovalent heterodimers, see, forexample, Inbar et al. (1972) Proc Natl Acad Sci USA 69:2659-2662; andEhrlich et al. (1980) Biochem 19:4091-4096); single-chain Fv molecules(sFv) (see, e.g., Huston et al. (1988) Proc Natl Acad Sci USA85:5879-5883); dimeric and trimeric antibody fragment constructs;minibodies (see, e.g., Pack et al. (1992) Biochem 31:1579-1584; Cumberet al. (1992) J Immunology 149B: 120-126); humanized antibody molecules(see, e.g., Riechmann et al. (1988) Nature 332:323-327; Verhoeyan et al.(1988) Science 239:1534-1536; and U.K. Patent Publication No. GB2,276,169, published 21 Sep. 1994); and, any functional fragmentsobtained from such molecules, wherein such fragments retainspecific-binding properties of the parent antibody molecule.

“Detectable moieties” or “detectable labels” contemplated for use in theinvention include, but are not limited to, radioisotopes, fluorescentdyes such as fluorescein, phycoerythrin, Cy-3, Cy-5, allophycoyanin,DAPI, Texas Red, rhodamine, Oregon green, Lucifer yellow, and the like,green fluorescent protein (GFP), red fluorescent protein (DsRed), CyanFluorescent Protein (CFP), Yellow Fluorescent Protein (YFP), CerianthusOrange Fluorescent Protein (cOFP), alkaline phosphatase (AP),beta-lactamase, chloramphenicol acetyltransferase (CAT), adenosinedeaminase (ADA), aminoglycoside phosphotransferase (neo^(r), G418^(r))dihydrofolate reductase (DHFR), hygromycin-B-phosphotransferase (HPH),thymidine kinase (TK), lacZ (encoding β-galactosidase), and xanthineguanine phosphoribosyltransferase (XGPRT), Beta-Glucuronidase (gus),Placental Alkaline Phosphatase (PLAP), Secreted Embryonic AlkalinePhosphatase (SEAP), or Firefly or Bacterial Luciferase (LUC). Enzymetags are used with their cognate substrate. The terms also includecolor-coded microspheres of known fluorescent light intensities (seee.g., microspheres with xMAP technology produced by Luminex (Austin,TX); microspheres containing quantum dot nanocrystals, for example,containing different ratios and combinations of quantum dot colors(e.g., Qdot nanocrystals produced by Life Technologies (Carlsbad, CA);glass coated metal nanoparticles (see e.g., SERS nanotags produced byNanoplex Technologies, Inc. (Mountain View, CA); barcode materials (seee.g., sub-micron sized striped metallic rods such as Nanobarcodesproduced by Nanoplex Technologies, Inc.), encoded microparticles withcolored bar codes (see e.g., CellCard produced by Vitra Bioscience,vitrabio.com), and glass microparticles with digital holographic codeimages (see e.g., CyVera microbeads produced by Illumina (San Diego,CA). As with many of the standard procedures associated with thepractice of the invention, skilled artisans will be aware of additionallabels that can be used.

“Diagnosis” as used herein generally includes determination as towhether a subject is likely affected by a given disease, disorder ordysfunction. The skilled artisan often makes a diagnosis on the basis ofone or more diagnostic indicators, i.e., a biomarker, the presence,absence, or amount of which is indicative of the presence or absence ofthe disease, disorder or dysfunction.

“Prognosis” as used herein generally refers to a prediction of theprobable course and outcome of a clinical condition or disease. Aprognosis of a patient is usually made by evaluating factors or symptomsof a disease that are indicative of a favorable or unfavorable course oroutcome of the disease. It is understood that the term “prognosis” doesnot necessarily refer to the ability to predict the course or outcome ofa condition with 100% accuracy. Instead, the skilled artisan willunderstand that the term “prognosis” refers to an increased probabilitythat a certain course or outcome will occur; that is, that a course oroutcome is more likely to occur in a patient exhibiting a givencondition, when compared to those individuals not exhibiting thecondition.

“Substantially purified” refers to nucleic acid molecules or proteinsthat are removed from their natural environment and are isolated orseparated, and are at least about 60% free, preferably about 75% free,and most preferably about 90% free, from other components with whichthey are naturally associated.

II. Modes of Carrying Out the Invention

Before describing the present invention in detail, it is to beunderstood that this invention is not limited to particular formulationsor process parameters as such may, of course, vary. It is also to beunderstood that the terminology used herein is for the purpose ofdescribing particular embodiments of the invention only, and is notintended to be limiting.

Although a number of methods and materials similar or equivalent tothose described herein can be used in the practice of the presentinvention, the preferred materials and methods are described herein.

The invention relates to the use of biomarkers either alone or incombination with clinical parameters for diagnosis of sepsis. Inparticular, the inventors have discovered a panel of biomarkers whoseexpression profile can be used to diagnose sepsis and to distinguishsepsis from noninfectious sources of systemic inflammation, such ascaused by traumatic injury, surgery, autoimmune disease, thrombosis, orsystemic inflammatory response syndrome (see Example 1).

A. Biomarkers

Biomarkers that can be used in the practice of the invention includepolynucleotides comprising nucleotide sequences from genes or RNAtranscripts of genes, including but not limited to, CEACAM1, ZDHHC19,C9orf95, GNA15, BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, andHLA-DPB1. Differential expression of these biomarkers is associated withsepsis and therefore expression profiles of these biomarkers are usefulfor diagnosing sepsis and distinguishing sepsis from non-infectiousinflammatory conditions, such as caused by traumatic injury, surgery,autoimmune disease, thrombosis, or systemic inflammatory responsesyndrome (SIRS).

Accordingly, in one aspect, the invention provides a method fordiagnosing sepsis in a subject, comprising measuring the level of aplurality of biomarkers in a biological sample derived from a subjectsuspected of having sepsis, and analyzing the levels of the biomarkersand comparing with respective reference value ranges for the biomarkers,wherein differential expression of one or more biomarkers in thebiological sample compared to one or more biomarkers in a control sampleindicates that the subject has sepsis. When analyzing the levels ofbiomarkers in a biological sample, the reference value ranges used forcomparison can represent the level of one or more biomarkers found inone or more samples of one or more subjects without sepsis (i.e., normalor non-infected control samples). Alternatively, the reference valuescan represent the level of one or more biomarkers found in one or moresamples of one or more subjects with sepsis. In certain embodiments, thelevels of the biomarkers are compared to time-matched reference valuesfor non-infected or infected/septic subjects.

The biological sample obtained from the subject to be diagnosed istypically whole blood, buffy coat, plasma, serum, or blood cells (e.g.,peripheral blood mononucleated cells (PBMCS), band cells, neutrophils,monocytes, or T cells), but can be any sample from bodily fluids, tissueor cells that contain the expressed biomarkers. A “control” sample, asused herein, refers to a biological sample, such as a bodily fluid,tissue, or cells that are not diseased. That is, a control sample isobtained from a normal or non-infected subject (e.g. an individual knownto not have sepsis). A biological sample can be obtained from a subjectby conventional techniques. For example, blood can be obtained byvenipuncture, and solid tissue samples can be obtained by surgicaltechniques according to methods well known in the art.

In certain embodiments, a panel of biomarkers is used for diagnosis ofsepsis. Biomarker panels of any size can be used in the practice of theinvention. Biomarker panels for diagnosing sepsis typically comprise atleast 3 biomarkers and up to 30 biomarkers, including any number ofbiomarkers in between, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30biomarkers. In certain embodiments, the invention includes a biomarkerpanel comprising at least 3, or at least 4, or at least 5, or at least6, or at least 7, or at least 8, or at least 9, or at least 10, or atleast 11 or more biomarkers. Although smaller biomarker panels areusually more economical, larger biomarker panels (i.e., greater than 30biomarkers) have the advantage of providing more detailed informationand can also be used in the practice of the invention.

In certain embodiments, the invention includes a panel of biomarkers fordiagnosing sepsis comprising one or more polynucleotides comprising anucleotide sequence from a gene or an RNA transcript of a gene selectedfrom the group consisting of CEACAM1, ZDHHC19, C9orf95, GNA15, BATF,C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, and HLA-DPB1. In one embodiment,the panel of biomarkers comprises a CEACAM1 polynucleotide, a ZDHHC19polynucleotide, a C9orf95 polynucleotide, a GNA15 polynucleotide, a BATFpolynucleotide, a C3AR1 polynucleotide, a KIAA1370 polynucleotide, aTGFBI polynucleotide, a MTCH1 polynucleotide, a RPGRIP 1 polynucleotide,and a HLA-DPB 1 polynucleotide.

In certain embodiments, an infection Z-score is used for diagnosis ofsepsis. The infection Z-score is calculated by subtracting the geometricmean of the expression levels of all measured biomarkers that areunderexpressed compared to control reference values for the biomarkersfrom the geometric mean of the expression levels of all measuredbiomarkers that are overexpressed compared to control reference valuesfor the biomarkers, and multiplying the difference by the ratio of thenumber of biomarkers that are overexpressed to the number of biomarkersthat are underexpressed compared to control reference values for thebiomarkers. A higher infection Z-score for the subject compared toreference value ranges for non-infected control subjects indicates thatthe subject has sepsis (see Example 1).

In other embodiments, a sepsis score is used for diagnosis of sepsis. Asepsis score for a patient can be calculated based on the levels ofexpression of CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, C3AR1, KIAA1370,TGFBI, MTCH1, RPGRIP1, and HLA-DPB1 biomarkers according to thefollowing formula:

$\begin{array}{l}{\sqrt[6]{\left( {CEACAM1 \ast ZDHHC19 \ast C9orf95 \ast GNA15 \ast BATF \ast C3AR1} \right)} -} \\{\frac{5}{6}\sqrt[5]{\left( {KIAA1370 \ast TGFBI \ast MTCH1 \ast RPGRIP1 \ast HLA - DPB1} \right)}.}\end{array}$

A higher sepsis score for a subject compared to reference value rangesfor non-infected control subjects indicates that the subject has sepsis(see Example 2).

In another aspect, the invention includes an assay comprising: a)measuring each biomarker of a biomarker panel, described herein, in abiological sample collected from a subject suspected of having sepsis;and b) comparing the measured value of each biomarker of the biomarkerpanel in the biological sample with reference values for each biomarkerfor a control subject, wherein differential expression of the biomarkersin the biological sample compared to the reference values indicate thatthe subject has sepsis. In certain embodiments, the assay furthercomprises determining an infection Z-score, as described herein.

The methods described herein may be used to determine if a patienthaving systemic inflammation should be treated for sepsis. For example,a patient is selected for treatment for sepsis if the patient has apositive sepsis diagnosis based on a biomarker expression profile or aninfection Z-score or sepsis score, as described herein.

In one embodiment, the invention includes a method of treating a subjecthaving sepsis, the method comprising: a) diagnosing the subject withsepsis according to a method described herein; and b) administering atherapeutically effective amount of broad spectrum antibiotics to thesubject if the subject has a positive sepsis diagnosis.

In another embodiment, the invention includes a method of treating asubject suspected of having sepsis, the method comprising: a) receivinginformation regarding the diagnosis of the subject according to a methoddescribed herein; and b) administering a therapeutically effectiveamount of broad spectrum antibiotics to the subject if the patient has apositive sepsis diagnosis.

In another embodiment, the invention includes a method for monitoringthe efficacy of a therapy for treating sepsis in a subject, the methodcomprising: measuring levels of expression of CEACAM1, ZDHHC19, C9orf95,GNA15, BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, and HLA-DPB1biomarkers in a first sample derived from the subject before the subjectundergoes the therapy and a second sample derived from the subject afterthe subject undergoes the therapy, wherein increased levels ofexpression of the CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, and C3AR1biomarkers and decreased levels of expression of the KIAA1370, TGFBI,MTCH1, RPGRIP1, and HLA-DPB1 biomarkers in the second sample compared tothe levels of expression of the biomarkers in the first sample indicatethat the subject is worsening, and decreased levels of expression of theCEACAM1, ZDHHC19, C9orf95, GNA15, BATF, and C3AR1 biomarkers andincreased levels of expression of the KIAA1370, TGFBI, MTCH1, RPGRIP1,and HLA-DPB1 biomarkers in the second sample compared to the levels ofexpression of the biomarkers in the first sample indicate that thesubject is improving. The method may further comprise calculating asepsis score for the subject, wherein a higher sepsis score for thesecond sample compared to the sepsis score for the first sampleindicates that the subject is worsening, and a lower sepsis score forthe second sample compared to the sepsis score for the first sampleindicates that the subject is improving.

B. Detecting and Measuring Biomarkers

It is understood that the biomarkers in a sample can be measured by anysuitable method known in the art. Measurement of the expression level ofa biomarker can be direct or indirect. For example, the abundance levelsof RNAs or proteins can be directly quantitated. Alternatively, theamount of a biomarker can be determined indirectly by measuringabundance levels of cDNAs, amplified RNAs or DNAs, or by measuringquantities or activities of RNAs, proteins, or other molecules (e.g.,metabolites) that are indicative of the expression level of thebiomarker. The methods for measuring biomarkers in a sample have manyapplications. For example, one or more biomarkers can be measured to aidin the diagnosis of sepsis, to determine the appropriate treatment for asubject, to monitor responses in a subject to treatment, or to identifytherapeutic compounds that modulate expression of the biomarkers in vivoor in vitro.

Detecting Biomarker Polynucleotides

In one embodiment, the expression levels of the biomarkers aredetermined by measuring polynucleotide levels of the biomarkers. Thelevels of transcripts of specific biomarker genes can be determined fromthe amount of mRNA, or polynucleotides derived therefrom, present in abiological sample. Polynucleotides can be detected and quantitated by avariety of methods including, but not limited to, microarray analysis,polymerase chain reaction (PCR), reverse transcriptase polymerase chainreaction (RT-PCR), Northern blot, and serial analysis of gene expression(SAGE). See, e.g., Draghici Data Analysis Tools for DNA Microarrays,Chapman and Hall/CRC, 2003; Simon et al. Design and Analysis of DNAMicroarray Investigations, Springer, 2004; Real-Time PCR: CurrentTechnology and Applications, Logan, Edwards, and Saunders eds., CaisterAcademic Press, 2009; Bustin A-Z of Quantitative PCR (IUL Biotechnology,No. 5), International University Line, 2004; Velculescu et al. (1995)Science 270: 484-487; Matsumura et al. (2005) Cell. Microbiol. 7: 11-18;Serial Analysis of Gene Expression (SAGE): Methods and Protocols(Methods in Molecular Biology), Humana Press, 2008; herein incorporatedby reference in their entireties.

In one embodiment, microarrays are used to measure the levels ofbiomarkers. An advantage of microarray analysis is that the expressionof each of the biomarkers can be measured simultaneously, andmicroarrays can be specifically designed to provide a diagnosticexpression profile for a particular disease or condition (e.g., sepsis).

Microarrays are prepared by selecting probes which comprise apolynucleotide sequence, and then immobilizing such probes to a solidsupport or surface. For example, the probes may comprise DNA sequences,RNA sequences, or copolymer sequences of DNA and RNA. The polynucleotidesequences of the probes may also comprise DNA and/or RNA analogues, orcombinations thereof. For example, the polynucleotide sequences of theprobes may be full or partial fragments of genomic DNA. Thepolynucleotide sequences of the probes may also be synthesizednucleotide sequences, such as synthetic oligonucleotide sequences. Theprobe sequences can be synthesized either enzymatically in vivo,enzymatically in vitro (e.g., by PCR), or non-enzymatically in vitro.

Probes used in the methods of the invention are preferably immobilizedto a solid support which may be either porous or non-porous. Forexample, the probes may be polynucleotide sequences which are attachedto a nitrocellulose or nylon membrane or filter covalently at either the3′ or the 5′ end of the polynucleotide. Such hybridization probes arewell known in the art (see, e.g., Sambrook, et al., Molecular Cloning: ALaboratory Manual (3rd Edition, 2001). Alternatively, the solid supportor surface may be a glass or plastic surface. In one embodiment,hybridization levels are measured to microarrays of probes consisting ofa solid phase on the surface of which are immobilized a population ofpolynucleotides, such as a population of DNA or DNA mimics, or,alternatively, a population of RNA or RNA mimics. The solid phase may bea nonporous or, optionally, a porous material such as a gel.

In one embodiment, the microarray comprises a support or surface with anordered array of binding (e.g., hybridization) sites or “probes” eachrepresenting one of the biomarkers described herein. Preferably themicroarrays are addressable arrays, and more preferably positionallyaddressable arrays. More specifically, each probe of the array ispreferably located at a known, predetermined position on the solidsupport such that the identity (i.e., the sequence) of each probe can bedetermined from its position in the array (i.e., on the support orsurface). Each probe is preferably covalently attached to the solidsupport at a single site.

Microarrays can be made in a number of ways, of which several aredescribed below. However they are produced, microarrays share certaincharacteristics. The arrays are reproducible, allowing multiple copiesof a given array to be produced and easily compared with each other.Preferably, microarrays are made from materials that are stable underbinding (e.g., nucleic acid hybridization) conditions. Microarrays aregenerally small, e.g., between 1 cm² and 25 cm²; however, larger arraysmay also be used, e.g., in screening arrays. Preferably, a given bindingsite or unique set of binding sites in the microarray will specificallybind (e.g., hybridize) to the product of a single gene in a cell (e.g.,to a specific mRNA, or to a specific cDNA derived therefrom). However,in general, other related or similar sequences will cross hybridize to agiven binding site.

As noted above, the “probe” to which a particular polynucleotidemolecule specifically hybridizes contains a complementary polynucleotidesequence. The probes of the microarray typically consist of nucleotidesequences of no more than 1,000 nucleotides. In some embodiments, theprobes of the array consist of nucleotide sequences of 10 to 1,000nucleotides. In one embodiment, the nucleotide sequences of the probesare in the range of 10-200 nucleotides in length and are genomicsequences of one species of organism, such that a plurality of differentprobes is present, with sequences complementary and thus capable ofhybridizing to the genome of such a species of organism, sequentiallytiled across all or a portion of the genome. In other embodiments, theprobes are in the range of 10-30 nucleotides in length, in the range of10-40 nucleotides in length, in the range of 20-50 nucleotides inlength, in the range of 40-80 nucleotides in length, in the range of50-150 nucleotides in length, in the range of 80-120 nucleotides inlength, or are 60 nucleotides in length.

The probes may comprise DNA or DNA “mimics” (e.g., derivatives andanalogues) corresponding to a portion of an organism’s genome. Inanother embodiment, the probes of the microarray are complementary RNAor RNA mimics. DNA mimics are polymers composed of subunits capable ofspecific, Watson-Crick-like hybridization with DNA, or of specifichybridization with RNA. The nucleic acids can be modified at the basemoiety, at the sugar moiety, or at the phosphate backbone (e.g.,phosphorothioates).

DNA can be obtained, e.g., by polymerase chain reaction (PCR)amplification of genomic DNA or cloned sequences. PCR primers arepreferably chosen based on a known sequence of the genome that willresult in amplification of specific fragments of genomic DNA. Computerprograms that are well known in the art are useful in the design ofprimers with the required specificity and optimal amplificationproperties, such as Oligo version 5.0 (National Biosciences). Typicallyeach probe on the microarray will be between 10 bases and 50,000 bases,usually between 300 bases and 1,000 bases in length. PCR methods arewell known in the art, and are described, for example, in Innis et al.,eds., PCR Protocols: A Guide To Methods And Applications, Academic PressInc., San Diego, Calif. (1990); herein incorporated by reference in itsentirety. It will be apparent to one skilled in the art that controlledrobotic systems are useful for isolating and amplifying nucleic acids.

An alternative, preferred means for generating polynucleotide probes isby synthesis of synthetic polynucleotides or oligonucleotides, e.g.,using N-phosphonate or phosphoramidite chemistries (Froehler et al.,Nucleic Acid Res. 14:5399-5407 (1986); McBride et al., Tetrahedron Lett.24:246-248 (1983)). Synthetic sequences are typically between about 10and about 500 bases in length, more typically between about 20 and about100 bases, and most preferably between about 40 and about 70 bases inlength. In some embodiments, synthetic nucleic acids include non-naturalbases, such as, but by no means limited to, inosine. As noted above,nucleic acid analogues may be used as binding sites for hybridization.An example of a suitable nucleic acid analogue is peptide nucleic acid(see, e.g., Egholm et al., Nature 363:566-568 (1993); U.S. Pat. No.5,539,083).

Probes are preferably selected using an algorithm that takes intoaccount binding energies, base composition, sequence complexity,cross-hybridization binding energies, and secondary structure. SeeFriend et al., International Patent Publication WO 01/05935, publishedJan. 25, 2001; Hughes et al., Nat. Biotech. 19:342-7 (2001).

A skilled artisan will also appreciate that positive control probes,e.g., probes known to be complementary and hybridizable to sequences inthe target polynucleotide molecules, and negative control probes, e.g.,probes known to not be complementary and hybridizable to sequences inthe target polynucleotide molecules, should be included on the array. Inone embodiment, positive controls are synthesized along the perimeter ofthe array. In another embodiment, positive controls are synthesized indiagonal stripes across the array. In still another embodiment, thereverse complement for each probe is synthesized next to the position ofthe probe to serve as a negative control. In yet another embodiment,sequences from other species of organism are used as negative controlsor as “spike-in” controls.

The probes are attached to a solid support or surface, which may bemade, e.g., from glass, plastic (e.g., polypropylene, nylon),polyacrylamide, nitrocellulose, gel, or other porous or nonporousmaterial. One method for attaching nucleic acids to a surface is byprinting on glass plates, as is described generally by Schena et al,Science 270:467-470 (1995). This method is especially useful forpreparing microarrays of cDNA (See also, DeRisi et al, Nature Genetics14:457-460 (1996); Shalon et al., Genome Res. 6:639-645 (1996); andSchena et al., Proc. Natl. Acad. Sci. U.S.A. 93:10539-11286 (1995);herein incorporated by reference in their entireties).

A second method for making microarrays produces high-densityoligonucleotide arrays. Techniques are known for producing arrayscontaining thousands of oligonucleotides complementary to definedsequences, at defined locations on a surface using photolithographictechniques for synthesis in situ (see, Fodor et al., 1991, Science251:767-773; Pease et al., 1994, Proc. Natl. Acad. Sci. U.S.A.91:5022-5026; Lockhart et al., 1996, Nature Biotechnology 14:1675; U.S.Pat. Nos. 5,578,832; 5,556,752; and 5,510,270; herein incorporated byreference in their entireties) or other methods for rapid synthesis anddeposition of defined oligonucleotides (Blanchard et al., Biosensors &Bioelectronics 11:687-690; herein incorporated by reference in itsentirety). When these methods are used, oligonucleotides (e.g., 60-mers)of known sequence are synthesized directly on a surface such as aderivatized glass slide. Usually, the array produced is redundant, withseveral oligonucleotide molecules per RNA.

Other methods for making microarrays, e.g., by masking (Maskos andSouthern, 1992, Nuc. Acids. Res. 20:1679-1684; herein incorporated byreference in its entirety), may also be used. In principle, any type ofarray, for example, dot blots on a nylon hybridization membrane (seeSambrook, et al., Molecular Cloning: A Laboratory Manual, 3rd Edition,2001) could be used. However, as will be recognized by those skilled inthe art, very small arrays will frequently be preferred becausehybridization volumes will be smaller.

Microarrays can also be manufactured by means of an ink jet printingdevice for oligonucleotide synthesis, e.g., using the methods andsystems described by Blanchard in U.S. Pat. No. 6,028,189; Blanchard etal., 1996, Biosensors and Bioelectronics 11:687-690; Blanchard, 1998, inSynthetic DNA Arrays in Genetic Engineering, Vol. 20, J. K. Setlow, Ed.,Plenum Press, New York at pages 111-123; herein incorporated byreference in their entireties. Specifically, the oligonucleotide probesin such microarrays are synthesized in arrays, e.g., on a glass slide,by serially depositing individual nucleotide bases in “microdroplets” ofa high surface tension solvent such as propylene carbonate. Themicrodroplets have small volumes (e.g., 100 pL or less, more preferably50 pL or less) and are separated from each other on the microarray(e.g., by hydrophobic domains) to form circular surface tension wellswhich define the locations of the array elements (i.e., the differentprobes). Microarrays manufactured by this ink-jet method are typicallyof high density, preferably having a density of at least about 2,500different probes per 1 cm². The polynucleotide probes are attached tothe support covalently at either the 3′ or the 5′ end of thepolynucleotide.

Biomarker polynucleotides which may be measured by microarray analysiscan be expressed RNA or a nucleic acid derived therefrom (e.g., cDNA oramplified RNA derived from cDNA that incorporates an RNA polymerasepromoter), including naturally occurring nucleic acid molecules, as wellas synthetic nucleic acid molecules. In one embodiment, the targetpolynucleotide molecules comprise RNA, including, but by no meanslimited to, total cellular RNA, poly(A)⁺ messenger RNA (mRNA) or afraction thereof, cytoplasmic mRNA, or RNA transcribed from cDNA (i.e.,cRNA; see, e.g., Linsley & Schelter, U.S. Pat. Application Ser. No.09/411,074, filed Oct. 4, 1999, or U.S. Pat. No. 5,545,522, 5,891,636,or 5,716,785). Methods for preparing total and poly(A)⁺ RNA are wellknown in the art, and are described generally, e.g., in Sambrook, etal., Molecular Cloning: A Laboratory Manual (3rd Edition, 2001). RNA canbe extracted from a cell of interest using guanidinium thiocyanate lysisfollowed by CsCl centrifugation (Chirgwin et al., 1979, Biochemistry18:5294-5299), a silica gel-based column (e.g., RNeasy (Qiagen,Valencia, Calif.) or StrataPrep (Stratagene, La Jolla, Calif.)), orusing phenol and chloroform, as described in Ausubel et al., eds., 1989,Current Protocols In Molecular Biology, Vol. III, Green PublishingAssociates, Inc., John Wiley & Sons, Inc., New York, at pp.13.12.1-13.12.5). Poly(A)⁺ RNA can be selected, e.g., by selection witholigo-dT cellulose or, alternatively, by oligo-dT primed reversetranscription of total cellular RNA. RNA can be fragmented by methodsknown in the art, e.g., by incubation with ZnCl₂, to generate fragmentsof RNA.

In one embodiment, total RNA, mRNA, or nucleic acids derived therefrom,are isolated from a sample taken from a sepsis patient. Biomarkerpolynucleotides that are poorly expressed in particular cells may beenriched using normalization techniques (Bonaldo et al., 1996, GenomeRes. 6:791-806).

As described above, the biomarker polynucleotides can be detectablylabeled at one or more nucleotides. Any method known in the art may beused to label the target polynucleotides. Preferably, this labelingincorporates the label uniformly along the length of the RNA, and morepreferably, the labeling is carried out at a high degree of efficiency.For example, polynucleotides can be labeled by oligo-dT primed reversetranscription. Random primers (e.g., 9-mers) can be used in reversetranscription to uniformly incorporate labeled nucleotides over the fulllength of the polynucleotides. Alternatively, random primers may be usedin conjunction with PCR methods or T7 promoter-based in vitrotranscription methods in order to amplify polynucleotides.

The detectable label may be a luminescent label. For example,fluorescent labels, bioluminescent labels, chemiluminescent labels, andcolorimetric labels may be used in the practice of the invention.Fluorescent labels that can be used include, but are not limited to,fluorescein, a phosphor, a rhodamine, or a polymethine dye derivative.Additionally, commercially available fluorescent labels including, butnot limited to, fluorescent phosphoramidites such as FluorePrime(Amersham Pharmacia, Piscataway, N.J.), Fluoredite (Miilipore, Bedford,Mass.), FAM (ABI, Foster City, Calif.), and Cy3 or Cy5 (AmershamPharmacia, Piscataway, N.J.) can be used. Alternatively, the detectablelabel can be a radiolabeled nucleotide.

In one embodiment, biomarker polynucleotide molecules from a patientsample are labeled differentially from the corresponding polynucleotidemolecules of a reference sample. The reference can comprisepolynucleotide molecules from a normal biological sample (i.e., controlsample, e.g., blood from a subject not having sepsis) or from a sepsisreference biological sample, (e.g., blood from a subject having sepsis).

Nucleic acid hybridization and wash conditions are chosen so that thetarget polynucleotide molecules specifically bind or specificallyhybridize to the complementary polynucleotide sequences of the array,preferably to a specific array site, wherein its complementary DNA islocated. Arrays containing double-stranded probe DNA situated thereonare preferably subjected to denaturing conditions to render the DNAsingle-stranded prior to contacting with the target polynucleotidemolecules. Arrays containing single-stranded probe DNA (e.g., syntheticoligodeoxyribonucleic acids) may need to be denatured prior tocontacting with the target polynucleotide molecules, e.g., to removehairpins or dimers which form due to self-complementary sequences.

Optimal hybridization conditions will depend on the length (e.g.,oligomer versus polynucleotide greater than 200 bases) and type (e.g.,RNA, or DNA) of probe and target nucleic acids. One of skill in the artwill appreciate that as the oligonucleotides become shorter, it maybecome necessary to adjust their length to achieve a relatively uniformmelting temperature for satisfactory hybridization results. Generalparameters for specific (i.e., stringent) hybridization conditions fornucleic acids are described in Sambrook, et al., Molecular Cloning: ALaboratory Manual (3rd Edition, 2001), and in Ausubel et al., CurrentProtocols In Molecular Biology, vol. 2, Current Protocols Publishing,New York (1994). Typical hybridization conditions for the cDNAmicroarrays of Schena et al. are hybridization in 5.times.SSC plus 0.2%SDS at 65° C. for four hours, followed by washes at 25° C. in lowstringency wash buffer (1×SSC plus 0.2% SDS), followed by 10 minutes at25° C. in higher stringency wash buffer (0.1× SSC plus 0.2% SDS) (Schenaet al., Proc. Natl. Acad. Sci. U.S.A. 93:10614 (1993)). Usefulhybridization conditions are also provided in, e.g., Tijessen, 1993,Hybridization With Nucleic Acid Probes, Elsevier Science PublishersB.V.; and Kricka, 1992, Nonisotopic Dna Probe Techniques, AcademicPress, San Diego, Calif. Particularly preferred hybridization conditionsinclude hybridization at a temperature at or near the mean meltingtemperature of the probes (e.g., within 51° C., more preferably within21° C.) in 1 M NaCl, 50 mM MES buffer (pH 6.5), 0.5% sodium sarcosineand 30% formamide.

When fluorescently labeled gene products are used, the fluorescenceemissions at each site of a microarray may be, preferably, detected byscanning confocal laser microscopy. In one embodiment, a separate scan,using the appropriate excitation line, is carried out for each of thetwo fluorophores used. Alternatively, a laser may be used that allowssimultaneous specimen illumination at wavelengths specific to the twofluorophores and emissions from the two fluorophores can be analyzedsimultaneously (see Shalon et al., 1996, “A DNA microarray system foranalyzing complex DNA samples using two-color fluorescent probehybridization,” Genome Research 6:639-645, which is incorporated byreference in its entirety for all purposes). Arrays can be scanned witha laser fluorescent scanner with a computer controlled X-Y stage and amicroscope objective. Sequential excitation of the two fluorophores isachieved with a multi-line, mixed gas laser and the emitted light issplit by wavelength and detected with two photomultiplier tubes.Fluorescence laser scanning devices are described in Schena et al.,Genome Res. 6:639-645 (1996), and in other references cited herein.Alternatively, the fiber-optic bundle described by Ferguson et al.,Nature Biotech. 14:1681-1684 (1996), may be used to monitor mRNAabundance levels at a large number of sites simultaneously.

In one embodiment, the invention includes a microarray comprising anoligonucleotide that hybridizes to a CEACAM1 polynucleotide, anoligonucleotide that hybridizes to a ZDHHC19 polynucleotide, anoligonucleotide that hybridizes to a C9orf95 polynucleotide, anoligonucleotide that hybridizes to a GNA15 polynucleotide, anoligonucleotide that hybridizes to a BATF polynucleotide, anoligonucleotide that hybridizes to a C3AR1 polynucleotide, anoligonucleotide that hybridizes to a KIAA1370 polynucleotide, anoligonucleotide that hybridizes to a TGFBI polynucleotide, anoligonucleotide that hybridizes to a MTCH1 polynucleotide, anoligonucleotide that hybridizes to a RPGRIP 1 polynucleotide, and anoligonucleotide that hybridizes to a HLA-DPB1 polynucleotide.

Polynucleotides can also be analyzed by other methods including, but notlimited to, northern blotting, nuclease protection assays, RNAfingerprinting, polymerase chain reaction, ligase chain reaction, Qbetareplicase, isothermal amplification method, strand displacementamplification, transcription based amplification systems, nucleaseprotection (S1 nuclease or RNAse protection assays), SAGE as well asmethods disclosed in International Publication Nos. WO 88/10315 and WO89/06700, and International Applications Nos. PCT/US87/00880 andPCT/US89/01025; herein incorporated by reference in their entireties.

A standard Northern blot assay can be used to ascertain an RNAtranscript size, identify alternatively spliced RNA transcripts, and therelative amounts of mRNA in a sample, in accordance with conventionalNorthern hybridization techniques known to those persons of ordinaryskill in the art. In Northern blots, RNA samples are first separated bysize by electrophoresis in an agarose gel under denaturing conditions.The RNA is then transferred to a membrane, cross-linked, and hybridizedwith a labeled probe. Nonisotopic or high specific activity radiolabeledprobes can be used, including random-primed, nick-translated, orPCR-generated DNA probes, in vitro transcribed RNA probes, andoligonucleotides. Additionally, sequences with only partial homology(e.g., cDNA from a different species or genomic DNA fragments that mightcontain an exon) may be used as probes. The labeled probe, e.g., aradiolabelled cDNA, either containing the full-length, single strandedDNA or a fragment of that DNA sequence may be at least 20, at least 30,at least 50, or at least 100 consecutive nucleotides in length. Theprobe can be labeled by any of the many different methods known to thoseskilled in this art. The labels most commonly employed for these studiesare radioactive elements, enzymes, chemicals that fluoresce when exposedto ultraviolet light, and others. A number of fluorescent materials areknown and can be utilized as labels. These include, but are not limitedto, fluorescein, rhodamine, auramine, Texas Red, AMCA blue and LuciferYellow. A particular detecting material is anti-rabbit antibody preparedin goats and conjugated with fluorescein through an isothiocyanate.Proteins can also be labeled with a radioactive element or with anenzyme. The radioactive label can be detected by any of the currentlyavailable counting procedures. Isotopes that can be used include, butare not limited to, ³H, ¹⁴C, ³²P, ³⁵S, ³⁶Cl, ³⁵Cr, ⁵⁷Co, ⁵⁸Co, ⁵⁹Fe,⁹⁰Y, ¹²⁵I, ¹³¹I, and ¹⁸⁶Re. Enzyme labels are likewise useful, and canbe detected by any of the presently utilized colorimetric,spectrophotometric, fluorospectrophotometric, amperometric or gasometrictechniques. The enzyme is conjugated to the selected particle byreaction with bridging molecules such as carbodiimides, diisocyanates,glutaraldehyde and the like. Any enzymes known to one of skill in theart can be utilized. Examples of such enzymes include, but are notlimited to, peroxidase, beta-D-galactosidase, urease, glucose oxidaseplus peroxidase and alkaline phosphatase. U.S. Pat. Nos. 3,654,090,3,850,752, and 4,016,043 are referred to by way of example for theirdisclosure of alternate labeling material and methods.

Nuclease protection assays (including both ribonuclease protectionassays and S1 nuclease assays) can be used to detect and quantitatespecific mRNAs. In nuclease protection assays, an antisense probe(labeled with, e.g., radiolabeled or nonisotopic) hybridizes in solutionto an RNA sample. Following hybridization, single-stranded, unhybridizedprobe and RNA are degraded by nucleases. An acrylamide gel is used toseparate the remaining protected fragments. Typically, solutionhybridization is more efficient than membrane-based hybridization, andit can accommodate up to 100 µg of sample RNA, compared with the 20-30µg maximum of blot hybridizations.

The ribonuclease protection assay, which is the most common type ofnuclease protection assay, requires the use of RNA probes.Oligonucleotides and other single-stranded DNA probes can only be usedin assays containing S1 nuclease. The single-stranded, antisense probemust typically be completely homologous to target RNA to preventcleavage of the probe:target hybrid by nuclease.

Serial Analysis Gene Expression (SAGE) can also be used to determine RNAabundances in a cell sample. See, e.g., Velculescu et al., 1995, Science270:484-7; Carulli, et al., 1998, Journal of Cellular BiochemistrySupplements 30/31:286-96; herein incorporated by reference in theirentireties. SAGE analysis does not require a special device fordetection, and is one of the preferable analytical methods forsimultaneously detecting the expression of a large number oftranscription products. First, poly A⁺ RNA is extracted from cells.Next, the RNA is converted into cDNA using a biotinylated oligo (dT)primer, and treated with a four-base recognizing restriction enzyme(Anchoring Enzyme: AE) resulting in AE-treated fragments containing abiotin group at their 3′ terminus. Next, the AE-treated fragments areincubated with streptoavidin for binding. The bound cDNA is divided intotwo fractions, and each fraction is then linked to a differentdouble-stranded oligonucleotide adapter (linker) A or B. These linkersare composed of: (1) a protruding single strand portion having asequence complementary to the sequence of the protruding portion formedby the action of the anchoring enzyme, (2) a 5′ nucleotide recognizingsequence of the IIS-type restriction enzyme (cleaves at a predeterminedlocation no more than 20 bp away from the recognition site) serving as atagging enzyme (TE), and (3) an additional sequence of sufficient lengthfor constructing a PCR-specific primer. The linker-linked cDNA iscleaved using the tagging enzyme, and only the linker-linked cDNAsequence portion remains, which is present in the form of a short-strandsequence tag. Next, pools of short-strand sequence tags from the twodifferent types of linkers are linked to each other, followed by PCRamplification using primers specific to linkers A and B. As a result,the amplification product is obtained as a mixture comprising myriadsequences of two adjacent sequence tags (ditags) bound to linkers A andB. The amplification product is treated with the anchoring enzyme, andthe free ditag portions are linked into strands in a standard linkagereaction. The amplification product is then cloned. Determination of theclone’s nucleotide sequence can be used to obtain a read-out ofconsecutive ditags of constant length. The presence of mRNAcorresponding to each tag can then be identified from the nucleotidesequence of the clone and information on the sequence tags.

Quantitative reverse transcriptase PCR (qRT-PCR) can also be used todetermine the expression profiles of biomarkers (see, e.g., U.S. Pat.Application Publication No. 2005/0048542A1; herein incorporated byreference in its entirety). The first step in gene expression profilingby RT-PCR is the reverse transcription of the RNA template into cDNA,followed by its exponential amplification in a PCR reaction. The twomost commonly used reverse transcriptases are avilo myeloblastosis virusreverse transcriptase (AMV-RT) and Moloney murine leukemia virus reversetranscriptase (MLV-RT). The reverse transcription step is typicallyprimed using specific primers, random hexamers, or oligo-dT primers,depending on the circumstances and the goal of expression profiling. Forexample, extracted RNA can be reverse-transcribed using a GeneAmp RNAPCR kit (Perkin Elmer, Calif., USA), following the manufacturer’sinstructions. The derived cDNA can then be used as a template in thesubsequent PCR reaction.

Although the PCR step can use a variety of thermostable DNA-dependentDNA polymerases, it typically employs the Taq DNA polymerase, which hasa 5′-3′ nuclease activity but lacks a 3′-5′ proofreading endonucleaseactivity. Thus, TAQMAN PCR typically utilizes the 5′-nuclease activityof Taq or Tth polymerase to hydrolyze a hybridization probe bound to itstarget amplicon, but any enzyme with equivalent 5′ nuclease activity canbe used. Two oligonucleotide primers are used to generate an amplicontypical of a PCR reaction. A third oligonucleotide, or probe, isdesigned to detect nucleotide sequence located between the two PCRprimers. The probe is non-extendible by Taq DNA polymerase enzyme, andis labeled with a reporter fluorescent dye and a quencher fluorescentdye. Any laser-induced emission from the reporter dye is quenched by thequenching dye when the two dyes are located close together as they areon the probe. During the amplification reaction, the Taq DNA polymeraseenzyme cleaves the probe in a template-dependent manner. The resultantprobe fragments disassociate in solution, and signal from the releasedreporter dye is free from the quenching effect of the secondfluorophore. One molecule of reporter dye is liberated for each newmolecule synthesized, and detection of the unquenched reporter dyeprovides the basis for quantitative interpretation of the data.

TAQMAN RT-PCR can be performed using commercially available equipment,such as, for example, ABI PRISM 7700 sequence detection system.(Perkin-Elmer-Applied Biosystems, Foster City, Calif., USA), orLightcycler (Roche Molecular Biochemicals, Mannheim, Germany). In apreferred embodiment, the 5′ nuclease procedure is run on a real-timequantitative PCR device such as the ABI PRISM 7700 sequence detectionsystem. The system consists of a thermocycler, laser, charge-coupleddevice (CCD), camera and computer. The system includes software forrunning the instrument and for analyzing the data. 5′-Nuclease assaydata are initially expressed as Ct, or the threshold cycle. Fluorescencevalues are recorded during every cycle and represent the amount ofproduct amplified to that point in the amplification reaction. The pointwhen the fluorescent signal is first recorded as statisticallysignificant is the threshold cycle (Ct).

To minimize errors and the effect of sample-to-sample variation, RT-PCRis usually performed using an internal standard. The ideal internalstandard is expressed at a constant level among different tissues, andis unaffected by the experimental treatment. RNAs most frequently usedto normalize patterns of gene expression are mRNAs for the housekeepinggenes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and beta-actin.

A more recent variation of the RT-PCR technique is the real timequantitative PCR, which measures PCR product accumulation through adual-labeled fluorigenic probe (i.e., TAQMAN probe). Real time PCR iscompatible both with quantitative competitive PCR, where internalcompetitor for each target sequence is used for normalization, and withquantitative comparative PCR using a normalization gene contained withinthe sample, or a housekeeping gene for RT-PCR. For further details see,e.g. Held et al., Genome Research 6:986-994 (1996).

Analysis of Biomarker Data

Biomarker data may be analyzed by a variety of methods to identifybiomarkers and determine the statistical significance of differences inobserved levels of biomarkers between test and reference expressionprofiles in order to evaluate whether a patient has sepsis or systemicinflammation arising from a noninfectious source, such as traumaticinjury, surgery, autoimmune disease, thrombosis, or systemicinflammatory response syndrome (SIRS). In certain embodiments, patientdata is analyzed by one or more methods including, but not limited to,multivariate linear discriminant analysis (LDA), receiver operatingcharacteristic (ROC) analysis, principal component analysis (PCA),ensemble data mining methods, significance analysis of microarrays(SAM), cell specific significance analysis of microarrays (csSAM),spanning-tree progression analysis of density-normalized events (SPADE),and multi-dimensional protein identification technology (MUDPIT)analysis. (See, e.g., Hilbe (2009) Logistic Regression Models, Chapman &Hall/CRC Press; McLachlan (2004) Discriminant Analysis and StatisticalPattern Recognition. Wiley Interscience; Zweig et al. (1993) Clin. Chem.39:561-577; Pepe (2003) The statistical evaluation of medical tests forclassification and prediction, New York, NY: Oxford; Sing et al. (2005)Bioinformatics 21:3940-3941; Tusher et al. (2001) Proc. Natl. Acad. Sci.U.S.A. 98:5116-5121; Oza (2006) Ensemble data mining, NASA Ames ResearchCenter, Moffett Field, CA, USA; English et al. (2009) J. Biomed. Inform.42(2):287-295; Zhang (2007) Bioinformatics 8: 230; Shen-Orr et al.(2010) Journal of Immunology 184:144-130; Qiu et al. (2011) Nat.Biotechnol. 29(10):886-891; Ru et al. (2006) J. Chromatogr. A.1111(2):166-174, Jolliffe Principal Component Analysis (Springer Seriesin Statistics, 2^(nd) edition, Springer, NY, 2002), Koren et al. (2004)IEEE Trans Vis Comput Graph 10:459-470; herein incorporated by referencein their entireties.)

C. Kits

In yet another aspect, the invention provides kits for diagnosingsepsis, wherein the kits can be used to detect the biomarkers of thepresent invention. For example, the kits can be used to detect any oneor more of the biomarkers described herein, which are differentiallyexpressed in samples of a sepsis patient and healthy or non-infectedsubjects. The kit may include one or more agents for detection ofbiomarkers, a container for holding a biological sample isolated from ahuman subject suspected of having sepsis; and printed instructions forreacting agents with the biological sample or a portion of thebiological sample to detect the presence or amount of at least onesepsis biomarker in the biological sample. The agents may be packaged inseparate containers. The kit may further comprise one or more controlreference samples and reagents for performing an immunoassay ormicroarray analysis.

In certain embodiments, the kit comprises agents for measuring thelevels of at least eleven biomarkers of interest. For example, the kitmay include agents for detecting biomarkers of a panel comprising aCEACAM1 polynucleotide, a ZDHHC19 polynucleotide, a C9orf95polynucleotide, a GNA15 polynucleotide, a BATF polynucleotide, a C3AR1polynucleotide, a KIAA1370 polynucleotide, a TGFBI polynucleotide, aMTCH1 polynucleotide, a RPGRIP1 polynucleotide, and a HLA-DPB1polynucleotide. In addition, the kit may include agents for detectingmore than one biomarker panel, such as two or three biomarker panels,which can be used alone or together in any combination, and/or incombination with clinical parameters for diagnosis of sepsis.

In certain embodiments, the kit comprises a microarray for analysis of aplurality of biomarker polynucleotides. An exemplary microarray includedin the kit comprises an oligonucleotide that hybridizes to a CEACAM1polynucleotide, an oligonucleotide that hybridizes to a ZDHHC19polynucleotide, an oligonucleotide that hybridizes to a C9orf95polynucleotide, an oligonucleotide that hybridizes to a GNA15polynucleotide, an oligonucleotide that hybridizes to a BATFpolynucleotide, an oligonucleotide that hybridizes to a C3AR1polynucleotide, an oligonucleotide that hybridizes to a KIAA1370polynucleotide, an oligonucleotide that hybridizes to a TGFBIpolynucleotide, an oligonucleotide that hybridizes to a MTCH1polynucleotide, an oligonucleotide that hybridizes to a RPGRIP 1polynucleotide, and an oligonucleotide that hybridizes to a HLA-DPB 1polynucleotide.

The kit can comprise one or more containers for compositions containedin the kit. Compositions can be in liquid form or can be lyophilized.Suitable containers for the compositions include, for example, bottles,vials, syringes, and test tubes. Containers can be formed from a varietyof materials, including glass or plastic. The kit can also comprise apackage insert containing written instructions for methods of diagnosingsepsis.

The kits of the invention have a number of applications. For example,the kits can be used to determine if a subject has sepsis or some otherinflammatory condition arising from a noninfectious source, such astraumatic injury, surgery, autoimmune disease, thrombosis, or systemicinflammatory response syndrome (SIRS). In another example, the kits canbe used to determine if a patient should be treated for sepsis, forexample, with broad spectrum antibiotics. In another example, kits canbe used to monitor the effectiveness of treatment of a patient havingsepsis. In a further example, the kits can be used to identify compoundsthat modulate expression of one or more of the biomarkers in in vitro orin vivo animal models to determine the effects of treatment.

D. Diagnostic System and Computerized Methods for Diagnosis of Sepsis

In a further aspect, the invention includes a computer implementedmethod for diagnosing a patient suspected of having sepsis. The computerperforms steps comprising: receiving inputted patient data comprisingvalues for the levels of one or more sepsis biomarkers in a biologicalsample from the patient; analyzing the levels of one or more sepsisbiomarkers and comparing with respective reference value ranges for thesepsis biomarkers; calculating an infection Z-score or sepsis score forthe patient; calculating the likelihood that the patient has sepsis; anddisplaying information regarding the diagnosis of the patient. Incertain embodiments, the inputted patient data comprises values for thelevels of a plurality of sepsis biomarkers in a biological sample fromthe patient. In one embodiment, the inputted patient data comprisesvalues for the levels CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, C3AR1,KIAA1370, TGFBI, MTCH1, RPGRIP1, and HLA-DPB1 polynucleotides.

In a further aspect, the invention includes a diagnostic system forperforming the computer implemented method, as described. As shown inFIG. 16 , a diagnostic system 100 includes a computer 110 containing aprocessor 130, a storage component (i.e., memory) 120, a displaycomponent 150, and other components typically present in general purposecomputers. The storage component 120 stores information accessible bythe processor 130, including instructions that may be executed by theprocessor 130 and data that may be retrieved, manipulated or stored bythe processor.

The storage component includes instructions for determining thediagnosis of the subject. For example, the storage component includesinstructions for calculating an infection Z-score or sepsis score forthe subject based on biomarker expression levels, as described herein(see Examples 1 and 2). In addition, the storage component may furthercomprise instructions for performing multivariate linear discriminantanalysis (LDA), receiver operating characteristic (ROC) analysis,principal component analysis (PCA), ensemble data mining methods, cellspecific significance analysis of microarrays (csSAM), ormulti-dimensional protein identification technology (MUDPIT) analysis.The computer processor 130 is coupled to the storage component 120 andconfigured to execute the instructions stored in the storage componentin order to receive patient data and analyze patient data according toone or more algorithms. The display component 150 displays informationregarding the diagnosis of the patient.

The storage component 120 may be of any type capable of storinginformation accessible by the processor, such as a hard-drive, memorycard, ROM, RAM, DVD, CD-ROM, USB Flash drive, write-capable, andread-only memories. The processor 130 may be any well-known processor,such as processors from Intel Corporation. Alternatively, the processormay be a dedicated controller such as an ASIC.

The instructions may be any set of instructions to be executed directly(such as machine code) or indirectly (such as scripts) by the processor.In that regard, the terms “instructions,” “steps” and “programs” may beused interchangeably herein. The instructions may be stored in objectcode form for direct processing by the processor, or in any othercomputer language including scripts or collections of independent sourcecode modules that are interpreted on demand or compiled in advance.

Data may be retrieved, stored or modified by the processor 130 inaccordance with the instructions. For instance, although the diagnosticsystem is not limited by any particular data structure, the data may bestored in computer registers, in a relational database as a table havinga plurality of different fields and records, XML documents, or flatfiles. The data may also be formatted in any computer-readable formatsuch as, but not limited to, binary values, ASCII or Unicode. Moreover,the data may comprise any information sufficient to identify therelevant information, such as numbers, descriptive text, proprietarycodes, pointers, references to data stored in other memories (includingother network locations) or information which is used by a function tocalculate the relevant data.

In certain embodiments, the processor and storage component may comprisemultiple processors and storage components that may or may not be storedwithin the same physical housing. For example, some of the instructionsand data may be stored on removable CD-ROM and others within a read-onlycomputer chip. Some or all of the instructions and data may be stored ina location physically remote from, yet still accessible by, theprocessor. Similarly, the processor may actually comprise a collectionof processors which may or may not operate in parallel.

In one aspect, computer 110 is a server communicating with one or moreclient computers 140, 170. Each client computer may be configuredsimilarly to the server 110, with a processor, storage component andinstructions. Each client computer 140, 170 may be a personal computer,intended for use by a person 190-191, having all the internal componentsnormally found in a personal computer such as a central processing unit(CPU), display 150 (for example, a monitor displaying informationprocessed by the processor), CD-ROM, hard-drive, user input device (forexample, a mouse, keyboard, touch-screen or microphone) 160, speakers,modem and/or network interface device (telephone, cable or otherwise)and all of the components used for connecting these elements to oneanother and permitting them to communicate (directly or indirectly) withone another. Moreover, computers in accordance with the systems andmethods described herein may comprise any device capable of processinginstructions and transmitting data to and from humans and othercomputers including network computers lacking local storage capability.

Although the client computers 140 and 170 may comprise a full-sizedpersonal computer, many aspects of the system and method areparticularly advantageous when used in connection with mobile devicescapable of wirelessly exchanging data with a server over a network suchas the Internet. For example, client computer 170 may be awireless-enabled PDA such as a Blackberry phone, Apple iPhone, or otherInternet-capable cellular phone. In such regard, the user may inputinformation using a small keyboard, a keypad, a touch screen, or anyother means of user input. The computer may have an antenna 180 forreceiving a wireless signal.

The server 110 and client computers 140, 170 are capable of direct andindirect communication, such as over a network 200. Although only a fewcomputers are depicted in FIG. 16 , it should be appreciated that atypical system can include a large number of connected computers, witheach different computer being at a different node of the network 200.The network, and intervening nodes, may comprise various combinations ofdevices and communication protocols including the Internet, World WideWeb, intranets, virtual private networks, wide area networks, localnetworks, cell phone networks, private networks using communicationprotocols proprietary to one or more companies, Ethernet, WiFi and HTTP.Such communication may be facilitated by any device capable oftransmitting data to and from other computers, such as modems (e.g.,dial-up or cable), networks and wireless interfaces. Server 110 may be aweb server.

Although certain advantages are obtained when information is transmittedor received as noted above, other aspects of the system and method arenot limited to any particular manner of transmission of information. Forexample, in some aspects, information may be sent via a medium such as adisk, tape, flash drive, DVD, or CD-ROM. In other aspects, theinformation may be transmitted in a non-electronic format and manuallyentered into the system. Yet further, although some functions areindicated as taking place on a server and others on a client, variousaspects of the system and method may be implemented by a single computerhaving a single processor.

III. Experimental

Below are examples of specific embodiments for carrying out the presentinvention. The examples are offered for illustrative purposes only, andare not intended to limit the scope of the present invention in any way.

Efforts have been made to ensure accuracy with respect to numbers used(e.g., amounts, temperatures, etc.), but some experimental error anddeviation should, of course, be allowed for.

Example 1 A Comprehensive Time-Course-Based Multi-Cohort Analysis ofSepsis and Sterile Inflammation Reveals a Robust Diagnostic Gene SetIntroduction

We hypothesized that only time-matched comparisons, such as those thatcompare SIRS/trauma to sepsis at the same clinical time-points, wouldyield genes robustly diagnostic of sepsis. We carried out acomprehensive, time-course-based multi-cohort analysis of the publicallyavailable gene expression data in sepsis to identify a conserved 11-geneset that can robustly distinguish non-infectious inflammation (such asSIRS, trauma, and ICU admissions) from inflammation due to acuteinfections, as in sepsis. This 11-gene set had excellent diagnosticpower in the discovery cohorts, and was then validated in 15 independentcohorts.

Results Comprehensive Search and Labelled Principal Components Analysis(PCA) Visualizations

We identified 27 independent gene expression datasets that satisfied ourcriteria in GEO and ArrayExpress, from which a total of 2,903microarrays were included (Pankla et al. (2009) Genome Biol 10:R127;Tang et al. (2009) Crit Care Med 37:882-888; Cvijanovich et al. (2008)Physiol Genomics 34:127-134; Shanley et al. (2007) Mol Med 13:495-508;Wong et al. (2007) Physiol Genomics 30:146-155; Wong et al. (2009) CritCare Med 37:1558-1566; Wong et al. (2010) Pediatr Crit Care Med 11:349-355; Wong et al. (2011) Crit Care Med 39:2511-2517; Almansa et al.(2014) J Crit Care 29:307-309; Bermejo-Martin et al. (2010) Crit Care14:R167; Martin-Loeches et al. (2012) Med Intensiva 36:257-263; Tamayoet al. (2012) J Crit Care 27:616-622; Hu et al. (2013) Proc Natl AcadSci USA 110:12792-12797; Parnell et al. (2012) Crit Care 16:R157;Sutherland et al. (2011) Crit Care 15:R149; Tang et al. (2006) J CerebBlood Flow Metab 26:1089-1102; Tang et al. (2007) Am J Respir Crit CareMed 176:676- 684; Ahn et al. (2013) PLoS One 8:e48979; Dolinay et al.(2012) Am J Respir Crit Care Med 185:1225-1234; Berdal et al. (2011) JInfect 63:308-316; Berry et al. (2010) Nature 466:973-977; Fredrikssonet al. (2008) PLoS One 3:e3686; McDunn et al. (2008) PLoS One 3:e1564;Chung et al. (2006) J Am Coll Surg 203:585-598; Parnell et al. (2011)PLoS One 6:e17186; and Emonts, Ph.D. thesis, Erasmus UniversityRotterdam, (2008); herein incorporated by reference in theirentireties). These 27 datasets comprised only 22 independent cohorts, asthe six datasets from the Genomics of Pediatric SIRS/Septic ShockInvestigators (GPSSSI) were combined into a single cohort containing 219patients with SIRS or sepsis (Cvijanovich et al., supra; Shanley et al.,supra; Wong et al. (2007), supra; Wong et al. (2009), supra; Wong et al.(2010), supra; Wong et al. (2011), supra). Many of the samples used werefrom the Glue Grant trauma datasets, which have a total of 333 patientssampled at up to 8 time-points (1301 samples used here) after traumaticinjury. These 27 datasets contain cohorts of children and adults, menand women, with a mix of community-acquired and hospital acquiredsepsis, sampled from whole blood, neutrophils, and PBMCs.

First, we sought to use the simplest possible methods to see whethernon-infected SIRS/trauma patients and sepsis/infection patients could beseparated by gene expression. We thus co-normalized all availabledatasets comparing SIRS/trauma with sepsis/infection in a single matrix.Labeled PCA (using 168 genes identified by 10-fold cross-validatedLasso-penalized logistic regression) showed that SIRS/trauma patientscan be separated from sepsis patients with modest overlap (FIG. 1A).Next, we labeled each sample as “early” (within 48 hours of admission)or “late” (more than 48 hours of admission). The majority of thenon-separable samples were the ‘late’ samples (FIG. 1B). This findingremained true even when we included healthy patients as a separate class(FIGS. 7 ). Prior work has shown that gene expression after trauma,burns, or endotoxemia changes non-linearly over time (Cobb et al.,supra; Xiao et al., supra; Seok et al., supra; Desai et al., supra; andMcDunn et al., supra). This continuous change in expression afterinitial insult could explain the inability to distinguish non-infectedSIRS/trauma from sepsis in the ‘late’ samples if all time-points aretreated as equal.

Therefore, we sought to get a qualitative sense of whether geneexpression during the hospital course after injury is similar amongdifferent cohorts. We included all peripheral blood datasets thatexamined gene expression longitudinally over time after admission fornon-septic events. We used CUR matrix decomposition to identify the 100genes that were most orthogonal to each other, and used these to performlabelled PCA with classes determined by days post-injury. Reassuringly,the gene expression group at each time-point was closest to thetime-points by which it was bounded (for example, the days [1,2) groupwas preceded by days [0,1) and followed by days [2,3)). Furthermore,changes in expression over time explained most variance in the datasets,as evidenced by the different day-groups changing in each of the firstthree labeled principal components. In summary, our analysis showed thatthe changes in gene expression after trauma/ICU admissions (1) proceedin a nonlinear fashion over time, and (2) show similar changes over timeacross datasets.

Time-Matched Multi-Cohort Analysis

Since changes in gene expression after admission for trauma explain alarge amount of variance in the dataset, and since these changes proceednonlinearly, direct comparisons of a patient at admission with that samepatient several days later at the time of infection would be confoundedby ‘normal’ changes in expression due to recovery from the incitingevent, as well as any ‘abnormal’ changes due to the hospital-acquiredinfection. It would be extremely difficult to disentangle these changes,if not impossible. Consequently, comparisons that do not take clinicaltime into account will not yield biomarkers that can robustlydiscriminate infected from non-infected patients (FIG. 1 ). Therefore,we focused only on infection datasets that also included a time-matchednon-infected cohort (to allow for direct time-matched comparisons). Wethus separated the datasets into two groups: (1) datasets comparingpatients at hospital admission for trauma, surgery, or critical illnessversus patients at admission to the hospital for sepsis (GSE28750(Sutherland et al., supra), GSE32707 (Dolinay et al., supra), GSE40012(Parnell et al., supra), and the GPSSSI Unique combined datasets (n=408samples) (Cvijanovich et al., supra; Shanley et al., supra; Wong et al.(2007), supra; Wong et al. (2009), supra; Wong et al. (2010), supra; andWong et al. (2011), supra), and (2) the Glue Grant datasets containingpatients with hospital-acquired infections and day-matched non-infectedpatients, from which we used only patients in the buffy coat samplecohort (Table 1). The Glue Grant trauma cohorts were sampled at roughly0.5, 1, 4, 7, 14, 21, and 28 days since injury; these cohorts were thusdivided into their sampling time bins, creating subgroups in whichpatients diagnosed with an infection in a given time bin can be comparedto non-infected patients in the same time bin. For the buffy coatsamples, there were at least 10 patients present in five time bins, andthese were thus taken for further study. Thus, we used a total of 9cohorts comparing time-matched SIRS/trauma to sepsis/infection,comprising 663 samples (326 SIRS/trauma controls and 337sepsis/infection cases; Table 2 shows the cohorts in the multi-cohortanalysis.

We then applied our previously described (Khatri et al. (2013) J Exp Med210:2205-2221; herein incorporated by reference in its entirety)multi-cohort gene expression analysis framework to compare SIRS/traumawith sepsis/infection, including all 9 cohorts in a leave-onedataset-outfashion. The output from this analysis underwent a three-stepthresholding process (false discovery rate (FDR) < 1% for both pooledeffect size and Fischer’s method, inter-dataset heterogeneity p>0.01,and absolute summary effect size fold change >1.5), which yielded 82genes differentially expressed between SIRS/trauma and sepsis patientsacross all time-points (summary statistics for all 82 genes shown inTable 8). To obtain the most parsimonious set of significant genes thatbest discriminates between classes, we carried out a greedy forwardsearch to identify which combination of the 82 genes produced the bestimprovements in AUC across all discovery datasets. Here discriminationis based on an ‘infection Z-score’ that combines gene expression levels(using the difference of geometric means between positive and negativegenes) into a standardized score for each sample in each dataset. Thisyielded a final set of 11 genes (6 over- and 5 under-expressed insepsis; Table 3 and FIG. 2 ). The mean ROC AUC of this 11 gene set inthe 9 discovery cohorts was 0.87 (range 0.70-0.98; FIG. 3A and FIG. 9 ).

Glue Grant Sorted-Cell Cohort Validation

The Glue Grant trauma cohorts have two independent sub-cohorts; one isthe buffy coat cohort (samples processed 2004-2006 on Affymetrix arrayGPL570), the other is the sorted-cells cohort, which includedneutrophils, monocytes, and T-cells (samples processed 2008-2011 oncustom Glue Grant-Human (GGH) arrays). These cohorts are separatepatients, separated in time, and profiled using different technologies.While there inclusion criteria and enrolling sites are largely the same,they are otherwise independent. We thus validated our 11-gene signaturein the sorted-cell Glue Grant cohorts. Here we split the sorted-cellcohorts into the same time-bins as the discovery buffy-coat cohorts, andtreated each time-bin separately.

From the sorted-cells sub-cohort, we expected the neutrophil set toperform most similarly to a whole-blood sample, as neutrophils make up75-85% of the total leukocyte pool after trauma in both infected andnon-infected patients (and hence most of the gene expression present inperipheral blood) (FIG. 8 ). Indeed, the 11-gene set performed very wellat separating time-matched non-infected trauma patients from septictrauma patients (4 cohorts, 218 samples; mean AUC 0.83, range 0.73-0.89)(FIG. 3B). Surprisingly, the 11-gene set also showed discriminatorypower in the monocytes and T-cells from these same patients (monocytesAUC range 0.71-0.97, T-cells AUC range 0.69-0.9) (FIGS. 10A, 11A). Sincewe excluded any sorted-cell datasets from the multi-cohort analysis, wedid not expect diagnostic capability in these cell types. Interestingly,in the sorted-cells cohort, AUC increases with greater time sinceinitial trauma; this may suggest that inflammation due to infection iseasier to discriminate as the ‘genomic storm’ of traumatic injury beginsto recover.

Examination of the 11-Gene Set in the Glue Grant Cohorts

As expected, in the Glue Grant buffy coat cohort, patients within +/-24hours of diagnosis of infection have significantly higher infectionZ-scores at all time-points as compared to time-matched patients withoutinfection; this was validated in the neutrophils cohort(repeated-measures ANOVA p<0.0001; FIGS. 3C-3D, Table 9A). Comparison ofthe infection Z-score by time since injury in the buffy coat cohortshows a significant decline over time (repeated measures ANOVA changeover time p<0.0001), but there appears to be a lesser (though stillsignificant) effect in the neutrophils validation cohort (repeatedmeasures ANOVA change over time p<0.05) (FIGS. 3C-3D, Table 9A). Theinteraction of group with time since injury was not significant ineither discovery or validation cohorts, suggesting that the decline ininfection Z-scores over time for both groups is likely due to recoveryfrom traumatic injury resulting in reduced inflammation (Table 9A).

Next, we analyzed how infection Z-scores changed in infected patientsbefore and after diagnosis of infection (samples that were not includedin identifying the 11-gene set). We grouped the samples from patientswho were ever diagnosed with infection on the same hospital stay intofour groups according to their time from diagnosis of infection (eithergreater than 5 days prior to infection, 5-to-1 days prior to infection,within +/- 1 day of diagnosis of infection, or 2-to-5 days afterdiagnosis of infection, where no group besides the +/- 1 day ofdiagnosis of infection was included in the multi-cohort analysis fordiscovery of the 11-gene set). We further divided these groups into binsaccording to days since injury. Within each time-bin, the infectionZ-scores for the diagnostic groups increased significantly as theyprogressed towards infection for both the discovery buffy coat cohortand the validation neutrophils cohort (Jonckheere trend (JT) testp<0.01; FIGS. 3E-3F). Furthermore, in all cohorts, the infection Z-scoredeclined in the groups that were 2-5 days after infection diagnosis,when patients are beginning to recover from infection, presumably due toantibiotic treatment. This may also explain the increase in diagnosticpower as time increases since initial injury. We emphasize that theresulting ‘peak’ in infection Z-score around the time of infectiondiagnosis validates the association of the infection Z-score withclinical infection, because neither the >5 days prior cohorts, the 1-5days prior cohorts, nor the 2-5 after cohorts were included in themulti-cohort analysis, but still show the hypothesized trends in boththe discovery buffy coat cohort and the validation neutrophils cohort.Similar results were seen in the monocytes and T-cells samples (samepatients as the neutrophils validation cohorts; FIGS. 10B, 11B).

Interestingly, the infection Z-scores for patients that were laterinfected during their hospital stays were significantly higher in buffycoat samples at the time of admission than the patients never infectedduring their hospital admission (p<0.01; neutrophils validation groupp=0.05; FIGS. 3E-3F). One possibility is that there was a baselinedifference in injury severity, and that this might influence theinfection Z-score. Severely injured patients are known to be moresusceptible to infection (Osborn et al. (2004) Crit Care Med32:2234-2240). In order to test this hypothesis, we used linearregression of eventual hospital-acquired infection status, injuryseverity score, and their interaction to predict infection Z-score asthe independent variable (Table 9B). Both eventual hospital-acquiredinfection status and injury severity score were independentlysignificant in predicting infection Z-score at admission, indicatingthat injury severity alone does not explain these effects. Theinteraction term was significant and negative in both the discoverybuffy coat cohort and the validation neutrophils cohort samples, perhapssuggesting that higher infection Z-score at admission may indicategreater susceptibility to later infection. Further studies are needed toexamine this observation.

Clinical Utility in the Glue Grant

To test whether the infection Z-score might add to clinicaldeterminations of infection, we compared logistic regression using SIRScriteria alone to that using SIRS criteria plus our infection Z-score indiscriminating Glue Grant trauma patients (both buffy coat andneutrophils cohorts) with and without infection. The logistic regressionmodel using SIRS criteria alone had an AUC of 0.64, while SIRS criteriaplus the infection Z-score had an overall AUC (using a singlecoefficient for infections at all time-points) of 0.81 (FIG. 12 ). Thecontinuous net reclassification index (NRI) is a measure of how manypatients would be correctly re-classified by improving a disease marker;here the continuous NRI of adding the infection Z-score to SIRS alonewas 0.90 (95% CI 0.62 - 1.17), where a continuous NRI greater than 0.6is associated with ‘strong’ improvement in prediction (Pencina et al.(2012) Stat Med 31:101-113).

Independent Validation of the Infection Z-Score

Next, we validated our score in three independent longitudinal cohortsthat included only trauma or ICU patients that eventually acquiredinfections: GSE6377 (McDunn et al., supra), GSE12838, and EMEXP3001(Martin-Loeches et al., supra) (Table 4). All three cohorts followedpatients from the day of admission to at least through the day ofinfection diagnosis (mostly ventilator-associated pneumonia, VAP).Because all patients in each of the three cohorts acquired infections,they did not have time-matched non-infected controls. To compare thevalidation cohort infection cases with non-infected trauma patients, weused Glue Grant buffy coat non-infected controls. We internallynormalized each cohort using housekeeping genes, and then co-normalizedwith the Glue Grant buffy coat patients using empiric-Bayes batchcorrection. Then, we compared the validation cohorts to the Glue Grantnon-infected patients at matched time-points as a variable reference.Comparing trauma/ICU patients to a time-matched baseline is necessarybecause our earlier findings (FIGS. 3C-3F) showed a change over time ininfection Z-score in the non-infected patients (Table 9A). The threeindependent longitudinal trauma/ICU cohorts show that patients within+/- 1 day of infection are generally separable from time-matchednon-infected Glue Grant patients, with ROC AUCs ranging from 0.68-0.84(FIG. 4 ).

We further validated the 11-gene set in 8 additional independentdatasets that compared healthy patients to those with bacterial or viralsepsis at admission using whole blood samples (total N=446: GSE11755(Emonts et al., supra), GSE13015 (Pankla et al., supra), GSE20346(Parnell et al., supra), GSE21802 (Bermejo-Martin et al., supra),GSE25504 (Smith et al. (2014) Nat Commun 5:4649), GSE27131 (Berdal etal., supra), GSE33341 (Ahn et al., supra), and GSE40396 (Hu et al.,supra), Table 5). The infection Z-scores for all 8 datasets werecombined in a single violin plot, showing excellent separation (Wilcoxonp<1e-63, FIG. 5A). The mean ROC for separating healthy and septicpatients is 0.98 (range: 0.94-1.0, FIG. 5B).

Our results provide strong evidence that the infection Z-score declinesover time since admission/injury in whole blood, buffy coat,neutrophils, and monocytes. We have also shown that non-time-matchedcomparison yields inaccurate classification of infection, especially forlate acquired infections in SIRS/trauma patients. Hence, comparinginfection Z-scores of SIRS/trauma patients at admission with those oflate-acquired sepsis/infection patients would be an inaccurate measureof diagnostic power. However, because the effect of the decrease ininfection Z-score over time is relatively monotonous, comparison ofadmission SIRS/trauma/surgery patients with late-acquiredsepsis/infection would provide a lower limit on detection ROC AUC forthe infection Z-scores. In other words, because the infection Z-scoredecreases over time, if the non-infected patients tested at admissionhad been sampled later (at matched times to the sepsis patients), theirinfection Z-scores would be lower at that later time (and hence moreeasily separable from the higher infection Z-scores in the septicpatients). Using this inference, we examined four independent datasetsthat compared SIRS/trauma/surgery patients either to the same patientslater in their hospital course at onset of sepsis, or to a mixed cohortof patients with community-acquired and hospital-acquired sepsis. Thesedatasets studied whole blood (EMTAB1548 (Almansa et al., supra)),neutrophils (GSE5772 (Tang et al. (2007), supra)), and PBMCs (GSE9960(Tang et al. (2009), supra)); EMEXP3621(Vassiliou et al. (2013) CritCare 17:R199)) (Table 6). In each of these four datasets, the infectionZ-score separated late acquired infections from admission SIRS ortrauma, with ROC AUCs ranging from 0.48-0.76 in PBMCs to 0.86 in wholeblood (FIG. 13 ). We emphasize that these AUCs are expected to be lowerdue to their time-mismatched comparison, and are essentially the lowerlimits of what properly time-matched infection Z-scores would be in eachof these cell compartments.

Finally, we examined our 11-gene set in one dataset comparing healthypatients or those with autoimmune inflammation to acute bacterialinfections after diagnosis confirmation (GSE22098, n=274) (Berry et al.,supra; Allantaz et al. (2007) J Exp Med 204:2131-2144). Exact samplingtimes are not available, but typically confirmation of infection take24-72 hours, so these infection samples are expected to show lowerZ-scores than at the time of diagnosis. Still, the infection Z-score isable to discriminate healthy and autoimmune inflammation patients fromthose with acute infections (ROC AUC 0.72; FIG. 14 ). Considering thatcohorts studying autoimmune inflammation were not included in ourdiscovery, this provides validation of the specificity of the infectionZ-score for infectious inflammation.

The Effect of Infection Type on Infection Z-Score

In order to examine whether there were any infection-type-specificdifferences in the infection Z-score, we compared patients infected withGram positive vs. Gram negative bacteria, as well as those infected withviral infections to those with bacterial infections. The Glue Grantpatients were not analyzed as there were too few time-matched infectionpatients in each sub-cohort. Four datasets had information on Grampositive versus Gram negative infection, and four had data on bacterialvs viral infections; in neither case was there a clear trend ofdifferences in infection Z-score based on infection subtype (Table 10).

Gene Set Pathway Evaluation and Transcription Factor Analysis

Having validated the 11-gene set, we examined whether any mechanismmight explain why these genes were acting in concert. We analyzed the11-gene set with Ingenuity Pathway Analysis, which showed that severalof the genes are under control by IL-6 and JUN (FIG. 16 ). All 11 genesidentified by the multi-cohort analysis were loaded into both EncodeQTand PASTAA (chosen for a mix of experimental results and in silicotranscription factor predictions). EncodeQT found only one significanttranscription factor among the 6 positive genes (Max), and none for thenegative genes (EncodeQT Q-value ≤ 0.01, Table 10A). PASTAA showedenrichment for well-known pro-inflammatory transcription factors, suchas Nf-KB member c-Rel, Stat5, and Interferon Response Factors (IRFs) 1and 10 (Table 10B). However, since these two methods for transcriptionfactor analysis did not agree on an enriched set of factors, no obviousconclusions can be drawn.

Since there was no obvious network driver found, we next studied whetherthe genes were enriched in certain immune cell types that might explaintheir relation to sepsis. We searched GEO for humanimmune-cell-type-specific gene expression profiles, and found 277samples from 18 datasets matching our criteria. We aggregated these intobroad immune cell type signatures using mean gene expression scores. Wethen calculated standardized enrichment scores using the same method asthe Infection Z-score (difference of geometric means between positiveand negative genes). We did this both for the initial set of 82 genesfound to be significantly enriched in the multi-cohort analysis, and forthe 11-gene set found after forward search (the genes that make theInfection Z-score) (FIG. 6 ). The set of all 82 significant genes wasfound to be highly enriched in band cells only (>4 standard deviationsabove the mean, p < 1e-6). Interestingly, the 11-gene set wassignificantly enriched for band cells (>2 standard deviations above themean, p = 0.015) in band cells but also showed up-regulation inT-regulatory cells and down-regulation in dendritic cells. This suggeststhat one driving force in differential gene expression between sterileSIRS and sepsis is the presence of band cells; however, the best set ofgenes for diagnosis contains information possibly incorporating multiplecell-type shifts at once.

Discussion

The changes in gene expression after trauma and during sepsis have beendescribed as a ‘genomic storm’ (Xiao et al. (2011) J Exp Med208:2581-2590). The dozens of studies that we examined here havereported valuable insights into changes in gene expression that occur inresponse to SIRS, trauma, surgery, and sepsis; however, none of theprior single-study analyses has yet produced a common-use clinical toolto reduce the morbidity and mortality associated with sepsis. We used anintegrated, multi-cohort analysis, based on a growing understanding ofthe time-dependent changes in gene expression, to distinguish geneexpression in SIRS/trauma from that in sepsis. From this we found an11-gene set that is optimized for diagnostic capability, which wevalidated in 15 independent cohorts. With further prospective clinicalvalidation, this 11-gene set could assist with clinical sepsisdiagnosis, which could, in turn, have a major impact on patient care.

Both infectious and non-infectious inflammation can lead to SIRS throughactivation of the same innate immune pathways (TLRs, RLRs, NLRs, etc.),so the ‘typical’ pro-inflammatory genes and cytokines (such as TNF andthe interleukins) are generally expressed in both sterile and infectiousinflammation (Newton et al. (2012) Cold Spring Harb Perspect Biol 4).For instance, one recent study showed high correlation in geneexpression between sterile inflammation (Glue Grant burns cohort) andfour independent sepsis datasets, with as much as 93% of the geneschanging in the same direction in the two conditions (Seok et al. (2013)Proc Natl Acad Sci USA 110:3507-3512). Thus, a standardhypothesis-driven approach in the search of biomarkers specificallydifferentially expressed between sterile SIRS and sepsis is unlikely tosucceed, given that the ‘standard’ suite of cytokines and chemokinesknown to be expressed in sepsis are mostly also activated in sterileSIRS. However, several protein families such as the lectins and CEACAMshave been shown to have specificity only for pathogen-associatedmolecular patterns, thus giving rise to the possibility ofinfection-specific innate immune signaling pathways (Geijtenbeek et al.(2009) Nat Rev Immunol 9:465-479; Crocker et al. (2007) Nat Rev Immunol7:255-266; Kuespert et al. (2006) Curr Opin Cell Biol 18:565-571). Wethus took a data-driven, unbiased approach searching specifically forgenes that are homogeneously statistically differentially expressedbetween sterile SIRS/trauma patients and sepsis patients across multiplecohorts.

We systematically identified all publically available microarray-basedgenome-wide expression studies in SIRS, trauma, critical illness, acuteinfections, and sepsis, and sorted through all datasets to identifythose that compared non-infected SIRS, ICU, or trauma patients topatients with acute infections or sepsis. Time post-injury is known tobe an important factor in gene expression after trauma (Xiao et al.,supra; Seok et al., supra; Desai et al., supra; McDunn et al., supra).Across multiple independent cohorts, we showed that changes in geneexpression over time are nonlinear but follow a similar trajectory.Furthermore, the normal recovery from trauma induces large changes ingene expression over time. Therefore, a comparison of gene expression ator near the time of injury with a later time point in the same patient(such as at time of diagnosis of infection) will yield a large number ofdifferentially expressed genes solely due to the recovery process. It isthus very difficult to identify relatively small changes in geneexpression due to infection from the large changes caused by recovery.Therefore, we restricted our discovery cohorts to only those studiesthat compared SIRS/trauma and sepsis/infection patients at matched timepoints. However, unlike trauma or surgery, sepsis has no easily defined‘start’, since infections take time to manifest. We thus used as casespatients within 48 hours of admission for sepsis or within +/- 24 hoursfrom diagnosis of infection, as these are the times at which infectioussigns and symptoms are present, and a clinical diagnosis is necessary.We used a multi-cohort analysis approach (Khatri et al. (2013) J Exp Med210:2205-2221; Chen et al. (2014) Cancer Res 74:2892-2902) to compareSIRS/trauma and sepsis/infection patients in a time-matched manner,including 663 samples from 9 patient cohorts. We then used a forwardsearch, optimizing a sample size-weighted ROC AUC, to select aparsimonious set of statistically significant genes (FDR < 1%, absolutesummary effect size >1.5 fold) optimized for discriminatory power. Aninfection Z-score, defined as the geometric mean of the 11-gene set, hada mean ROC AUC of 0.87 in the discovery cohorts for distinguishingSIRS/trauma from sepsis/infection patients.

We validated this gene set in an independent group of patients from theGlue Grant. The mean AUC for distinguishing sepsis from non-infectiousinflammation was 0.83 in the neutrophils validation cohort, with a cleartrend towards better diagnostic power with greater time since initialinjury. Although we expect the whole-blood transcriptional profiles tobe largely driven by neutrophils, the signal in sorted cells willcertainly differ from whole blood. Thus, use of sorted cells instead ofwhole blood for diagnosis is expected to result in lower discriminatorypower. Despite this limitation, the infection Z-scores performedcomparably in validation cohorts, especially at three or more days sinceinitial injury, when initial traumatic inflammation wanes andhospital-acquired infections manifest (Hietbrink et al. (2013) Shock40:21-27).

Using the extensive clinical phenotype data available for patients inthe Glue Grant, we illustrated several important points about theapplication of the infection Z-score. First, the infection Z-scoreshowed a decline over time since injury that was similar in bothinfected and non-infected patients. We also showed that using thetime-variable non-infected baseline in the Glue Grant as referencethresholds allowed us to discriminate septic patients from non-infectedtrauma patients in three independent longitudinal cohorts with ROC AUCsranging from 0.68- 0.84. Thus, for maximal discriminatory power, if theinfection Z-score were to be tested prospectively in a longitudinalstudy, the diagnostic thresholds would need to be a function of the timesince initial injury. Second, the infection Z-scores increased over thedays prior to infection, peaked within the +/- 1 day surrounding thetime of infection diagnosis, and decreased afterwards (presumably due totreatment of infection). This observation raises the possibility thatearlier diagnosis or stratification of patients at risk of developingsepsis may be possible using the 11-gene set, although further studiesare required. In particular, we note that the early rise in infectionZ-score that precedes a clinical diagnosis of infection is not a falsepositive but an ‘early positive’ result. Finally, the infection Z-scorewas higher at admission in patients with higher injury severity score(ISS); initial infection Z-score thus depends on both ISS and relativetime to clinical infection. However, trauma patients within 24 hours ofadmission are not usually suspected to have non-obvious infection (otherthan open wounds, peritoneal contamination, etc.), and so we would notexpect the infection Z-score to be of clinical utility in this groupanyway.

In the Glue Grant buffy coat cohort, for those patients who had all fourSIRS markers available, SIRS binary parameters performed poorly indiscriminating patients at time of infection from non-infected patients(ROC AUC 0.64). SIRS criteria plus the infection Z-score with a globalcutoff (i.e., not broken into separate time-bins) increased thediscriminatory power (ROC AUC 0.81), with a continuous NRI of 0.9.However, SIRS is only one of several criteria used to diagnose sepsis.Procalcitonin is a well-studied biomarker for differentiating sepsisfrom SIRS; two meta-analyses of procalcitonin both showed summary ROCAUCs of 0.78 (range 0.66-0.90) (Tang et al. (2007) Lancet Infect Dis7:210-217; Uzzan et al. (2006) Crit Care Med 34:1996-2003; Cheval et al.(2000) Intensive Care Med 26 Suppl 2:S153-158; Ugarte et al. (1999) CritCare Med 27:498-504). The average AUC in our discovery cohorts was 0.87,and the time-matched neutrophils validation cohort had a mean AUC of0.83, both of which are thus at least comparable to procalcitonin. Weemphasize, however, that each of these markers need not be used inisolation. None of the publically available datasets includedprocalcitonin levels at time of diagnosis of sepsis. Thus, anyprospective study of the infection Z-score should include bothtraditional and new biomarkers, to test both for better diagnosticperformance using biomarker combinations and for head-to-headcomparisons.

We validated the infection Z-score in several additional externaldatasets, which included three longitudinal cohorts of ICU/traumapatients that developed VAP/VAT; eight cohorts of healthy patientscompared to bacterial or viral sepsis; four cohorts of admissionSIRS/trauma patients compared to patients at mixed or later time-pointsusing whole blood, neutrophils, PBMCs; and one cohort of patients withautoimmune inflammation compared to patients with acute infection. Theinfection Z-score had discriminatory power in every publically availabledataset that matched our inclusion criteria. Moreover, the infectionZ-score does not have systematic trends with regard to infection type(Gram positive versus Gram negative and bacterial versus viral) acrossthose datasets for which infection type information is available. Weemphasize that, based on the finding that the baseline infection Z-scoredecreases over time since injury in the Glue Grant data, a comparison ofadmission SIRS/trauma to a later time-point in any of these compartmentswill have worse diagnostic power than would a time-matched study. Thusthe discriminatory power of the infection Z-score in the fourindependent non-time-matched cohorts may be a lower bound on the truediscriminatory power in the respective blood compartments.

Some of the genes in the sepsis-specific 11-gene set have beenpreviously associated with sepsis or infections, such as CEACAM1, C3AR1,GNA15 and HLA-DPB1 (Madsen-Bouterse et al. (2010) Am J Reprod Immunol63:73-92; Wong et al. (2012) Crit Care 16:R213; Kwan et al. (2013) PLoSOne 8:e60501). The regulatory control of these genes may be enriched forpro-inflammatory factors such as IL-6, JUN, c-Rel, Stat5, and IRF ⅒based on in silico analyses, but no single common factor explained thenetwork. The gene sets found here may be better explained by cell-typeenrichment analyses. We show that band cells and the myeloid cell lineare highly enriched for the whole set of 82 genes found to besignificantly differentially expressed between sterile SIRS and sepsis.The finding of enrichment in band cells is particularly intriguing, asbands have previously been shown to help differentiate sterile SIRS andsepsis (Cavallazzi et al. (2010) J Intensive Care Med 25:353-357; Drifteet al. (2013) Crit Care Med 41:820-832). Further, there is very highvariability in band counts both by automatic blood counters and by hand(Cornbleet et al. (2002) Clin Lab Med 22:101-136; van der Meer et al.(2006) Eur J Haematol 76:251-254), and no good serum marker exists.However, the 11-gene set may be better at distinguishing sepsis fromsterile SIRS at least in part because it also includes information onincreased T-regulatory cells and decreased dendritic cells, both ofwhich have previously been implicated in sepsis (Saito et al. (2008)Tohoku J Exp Med 216:61-68; Venet et al. (2008) J Leukoc Biol83:523-535; Grimaldi et al. (2011) Intensive Care Med 37:1438-1446). Theconnection between the 11-gene set and different immune cell types mayhelp explain some sepsis biology, but certainly these 11 genes requirefurther study.

The potential translation of the current study to clinical use rests ontwo factors. First, both the 11-gene set and the protein products ofthese genes will need to be tested prospectively in a time-matchedmanner. Protein assays currently have a faster response time than PCRs,though a number of advances in PCR technology have brought time toresults down towards the range of clinical applicability (Park et al.(2011) Biotechnol Adv 29:830-839; Poritz et al. (2011) PLoS One 6,e26047). Second, our results showed that the changes in gene expressiondue to normal recovery from a traumatic event (such as injury orsurgery) mean that time must be properly accounted for in any geneexpression study of acute illness. Our search found several studies thatexamine time course after SIRS/trauma (GSE6377, GSE12838, GSE40012,EMEXP3001) and several that examine the time course since onset ofsepsis/infection (GSE20346, GSE2713, GSE40012, EMEXP3850). However, wefound only one publically available microarray study (the Glue Grant)that examined a cohort of patients over time for which some of thecohort develops infection and some do not. Thus, based on our results,we recommend that future studies of sepsis diagnostics should bedesigned with longitudinal cohorts both with and without infection, toenable appropriate time-matched comparisons (Johnson et al. (2007) AnnSurg 245:611- 621; Maslove et al. (2014) Trends Mol Med. 20(4):204-213).

Overall, our comprehensive analysis of publically available geneexpression data in SIRS/trauma and sepsis has yielded a parsimonious11-gene set with excellent discriminatory power in both the discoverycohorts and in 15 independent cohorts. Optimizing a clinical assay forthis gene set to get results within a window of clinical relevanceshould be feasible. Further study will be needed both to confirm ourclinical findings in a prospective manner, and to investigate themolecular pathways upstream of these genes.

Methods Study Design

The purpose of this study was to use an integrated multi-cohortmeta-analysis framework to analyze multiple gene expression datasets toidentify a set of genes that can separate patients with sterileinflammation from patients with infectious inflammation. This frameworkhas been described previously (Khatri et al. (2013) J Exp Med210:2205-2221; Chen et al. (2014) Cancer Res 74:2892-2902).

Search

Two public gene expression microarray repositories (NIH GEO,ArrayExpress) were searched for all human datasets that matched any ofthe following search terms: sepsis, SIRS, trauma, shock, surgery,infection, pneumonia, critical, ICU, inflammatory, nosocomial. Datasetsthat compared either healthy controls or patients with non-infectiousinflammation (SIRS, trauma, surgery, autoimmunity) to patients withacute infections and/or sepsis were kept for further study. Datasetsthat utilized endotoxin injection as a model for SIRS or sepsis were notincluded.

Multi-Cohort Analysis

A multi-cohort meta-analysis comparing gene expression in non-infectedSIRS/trauma patients versus patients with infections or sepsis wascompleted. All datasets with comparisons of SIRS/trauma patients toseptic/infected patients at the same time-point were selected forinclusion in the multi-cohort analysis; thus, comparisons of patients atadmission to those with sepsis at a later time-point were excluded (seejustification for this model in the Results). The admission datasetswere limited to samples from patients within 48 hours of admission. TheGlue Grant trauma datasets were split into time bins of days sinceinjury, excluding the initial 24 hours after admission (see SupplementalMethods). Each of these time bins were treated as separate datasets inthe multi-cohort analysis, where time-matched never-infected patientsare compared to patients within +/- 24 hours of diagnosis of infection(infection as defined above). Patients more than 24 hours afterdiagnosis of infection are thus censored in this comparison. This methodallows for detection of deviation due to infection from the ‘standard’changes in gene expression over time due to recovery from trauma.

After selecting the input datasets, we applied two meta-analysismethods; one combining effect sizes using Hedges’ g; the other usingFisher’s sum of logs method combining p-values (see schematic in FIG. 15). Given n datasets, this method is applied n times in aleave-one-dataset-out fashion. A false discovery rate (FDR) thresholdwas set (0.01), and genes with a q-value below the FDR threshold in boththe effect size and the Fischer’s sum of log analyses at every round ofthe leave-one-out analysis were selected. The genes were then subjectedto a dataset heterogeneity test, such that in a test for heterogeneityacross all input datasets, a p-value greater than 0.01 was required foreach gene; this removes genes that show significantly different effectsacross different datasets. Next, all genes with a summary effect size<1.5 fold were thrown out. Finally, all genes found to be statisticallysignificant in the multi-cohort analysis according to all three of theabove criteria (Table 8) were subjected to a greedy forward searchmodel, where, starting with the most significant gene, all remaininggenes are added to the gene score one at a time, and the gene with thegreatest increase in discriminatory ability is added to the final genelist.

Supplemental Methods Dataset Details

Six of the publicly available whole blood datasets were from theGenomics of Pediatric SIRS/Septic Shock Investigators (GPSSSI)(Cvijanovich et al. (2008) Physiol Genomics 34:127-134; Shanley et al.(2007) Mol Med 13:495-508; Wong et al. (2007) Physiol Genomics30:146-155; Wong et al. (2009) Crit Care Med 37:1558-1566; Wong et al.(2010) Pediatr Crit Care Med 11:349- 355; Wong et al. (2011) Crit CareMed 39:2511-2517). These datasets contain overlapping samples, for whichHector Wong provided a key of the unique patients. Those unique patientswere then gcRMA normalized together and treated as a single dataset(GPSSSI Unique).

In addition to the publicly-available datasets, we used the Inflammationand Host Response to Injury Program (Glue Grant) trauma datasets (Cobbet al., supra). The Glue Grant datasets consist of separate traumapatient cohorts sampled for either the entire buffy coat, or sortedcells (neutrophils, monocytes, T-cells). Inclusion criteria aredescribed elsewhere (Desai et al., supra). Patients were sampled at thefollowing days after admission: 0.5, 1, 4, 7, 14, 21, 28 days. The GlueGrant trauma cohort patients were classified as ‘infected’ if they had anosocomial infection (pneumonia, urinary tract infection,catheter-related bloodstream infection, etc.), a surgical infection(excluding superficial wound infections), or underwent surgery forperforated viscus; infection definitions can be found atgluegrant.org/commonlyreferencedpubs.htm. For meta-analyses, samplesdrawn within +/- 24 hours of the day of diagnosis of infection wereincluded as infection cases. Time-points with fewer than 20 patientswere not included in the multi-cohort analysis. The Glue Grant alsocontains burn patients, but these were not included due to thedifficulty of distinguishing clinically relevant infections fromcolonized burn wounds. Use of the Glue Grant was approved by both theGlue Grant Consortium and the Stanford University IRB (protocol 29798).

Gene Expression Normalization

All Affymetrix datasets were downloaded as CEL files and re-normalizedusing gcRMA (R package affy). Output from Agilent chips and customarrays analyzed on GenePix scanners were background corrected,within-arrays loess normalized, and then between-arrays quantilenormalized (R package limma) (G. Smyth, in Bioinformatics andComputational Biology Solutions Using R and Bioconductor, C.V. GentlemanR, Dudoit S, Irizarry R and Huber W (eds.), Ed. (Springer, New York,2005), pp. pp. 397-420). Illumina datasets were quantile normalized. TheGlue Grant sorted-cell datasets were analyzed using custom arrays(GGH-1, GGH-2); these were normalized as previously described and usedin their post-processed state (Xu et al. (2011) Proc Natl Acad Sci USA108:3707-3712). For all gene analyses, the mean of probes for commongenes was set as the gene expression level. All probeto-gene mappingswere downloaded from GEO from the most current SOFT files on Dec. 14,2014.

Labelled PCA Method

The labelled principal components analysis (PCA) method is animplementation of the constrained optimization described in equations 6and 7 in section 4.1 of Koren and Carmel (IEEE Trans Vis Comput Graph(2004) 10:459-470). This optimization computes a linear transformationof the data that maximizes the pairwise distance between points indifferent labelled classes of the data while maintaining the constraintthat the transformed data are orthogonal to each other. This orthogonalconstraint is slightly different to the constraint employed by PCA,which demands that the transformed basis is mutually orthogonal, not thetransformed data itself. While PCA is a projection scheme, labelled PCAis a general form of a linear transformation due to this difference inconstraint.

Call X the original dataset, with m rows (data points) and n columns(each data point has n elements). Y is an m by 1 matrix that has adifferent listing for each class. In other words, Y (i) equals Y (j) ifand only if elements i and j are part of the same labelled class. L is asymmetrical m by m matrix whose (i,j) entry is -1, unless Y (i) equals Y(j). In this latter case, the entry is 0. Finally, all diagonal entries(where i equals j) are filled so that row i sums to 0. Koren and Carmelprove in Lemma 3.2 that the eigenvectors of transpose (X)*L*X provide amapping that maximizes the pairwise distance between points in differentlabelled classes of the data. However, this transformation remains aprojection scheme, which means that these eigenvectors are orthogonal toeach other. This latter result limits the utility of the transformeddata, but is more generalizable. The general linear projection used inthis paper instead finds the vectors v that solve the equation Av = λBv,where A is transpose (X)*L*X, and B is transpose (X)*X. Although moreexpressive, this method is not as robust as the labelled PCA projectionscheme, however, since solutions to this generalized form require that Bis not singular. Since B is not the identity matrix, the old orthogonalconstraint used in projections does not have to hold. Instead, solutionsto this form require the basis is mutually orthogonal with respect tothe covariance basis of the original data.

Labelled PCA Applications

Labelled PCA method is described in the Supplemental Methods. Alldatasets that contain a comparison of non-infectious SIRS, ICU, ortrauma patients to sepsis patients were converted from probes to genes,and then bound into a single large matrix and quantile normalized. Genesnot present in all datasets were thrown out. Patients with sepsis at anytime (either on admission or hospital-acquired) were grouped in a singleclass, and Lasso-penalized regression was applied to separate sterileSIRS patients from sepsis patients (R package glmnet). Labelled PCA wascarried out using the genes selected by the penalized regression, on theclasses of sterile SIRS versus sepsis. The same graph was thenre-labelled to show which samples are from hospital-acquired (or late)sterile SIRS or sepsis patients. The same set of genes from thepenalized regression was then used in labelled PCA to compare healthy,sterile SIRS, and sepsis patients. This same graph was then re-labelledto show which samples are from hospital-acquired or late sterile SIRS orsepsis patients.

To examine the effects of time on gene expression in SIRS/trauma andinfection, all datasets that include serial measurements over time wereselected. From the Glue Grant datasets, only buffy coat arrays wereincluded, so as not to overwhelm the signal from the other datasets. Theselected datasets were converted from probes to genes, and then boundinto a single large matrix and quantile normalized. Genes not present inall datasets were thrown out. To reduce the gene set in an unbiasedmanner, CUR matrix decomposition was used to select the top 100 geneswith the greatest orthogonality in the combined datasets (R packagerCUR) (Bodor et al. (2012) BMC Bioinformatics 13:103). Labelled PCA wasthen carried out with each time point used as a different class (splitat 1, 2, 3, 4, 5, 6, 10, 20, and 40 days). The resulting PCA was graphedin 3D, colored by time point, and a short video of rotations of the 3Dspace was captured using R package rgl.

Infection Z-Score

Genes that were found to be significant after multi-cohort analysis wereseparated according to whether their effects were positive or negative(where ‘positive’ means a positive effect size in sepsis as compared toSIRS/trauma, and ‘negative’ means a negative effect size in sepsis ascompared to SIRS/trauma). The class discrimination power of these genesets was then tested using a single gene score. The gene score used isthe geometric mean of the gene expression level for all positive genesminus the geometric mean of the gene expression level of all negativegenes multiplied by the ratio of counts of positive to negative genes.This was calculated for each sample in a dataset, and the scores foreach dataset were then standardized to yield a Z-score (‘infectionZ-score’). Genes not present in an entire dataset were excluded; genesmissing for individual samples were set to 1. To obtain an infectionZ-score for datasets with negative gene expression values (two-channelarrays), the entire dataset was scaled by the minimum value present inthe dataset, to ensure all values were positive (since the geometricmean yields imaginary values for negative input).

Class discriminatory power was examined comparing the infection Z-scoresfor classes of interest in each examined dataset. The infection Z-scoreranges were examined with violin plots, and, since they cannot beassumed to have normal distributions, are shown with 25%-75%interquartile range and compared using Wilcoxon rank-sum test. ROCcurves of the infection Z-score were constructed comparing classes ofinterest (such as sterile SIRS compared to sepsis), and the total ROCarea under the curve (AUC) is shown, along with a 95% confidenceinterval.

Forward Search

To obtain a parsimonious gene set that discriminates SIRS/traumapatients and septic/infected patients, all genes found to bestatistically significant in the multi-cohort analysis were subjected toa greedy forward search model, where, starting with the most significantgene in the data set, all remaining genes are added to the gene scoreone at a time, and the gene with the greatest increase in discriminatoryability is added to the final gene list. Here, discriminatory abilitywas defined as a weighted ROC AUC, wherein the infection Z-score istested in each discovery dataset, and the resulting AUC is multiplied bythe total number of samples in the dataset. The function then maximizesthe sum of weighted AUCs across all discovery datasets for each step. Inthis way, excellent class discrimination in a small dataset does notoutweigh modest gains in class discrimination in a very large dataset.The function stops at an arbitrarily defined threshold; we used astopping threshold of one (such that when the function cannot find agene that will increase the total discovery weight AUCs of the currentinfection Z-score by more than one, it will terminate). This finalresulting gene set is thus maximized for discriminatory power in thediscovery cohorts, though is not optimized as a global maximum. Theprobe-level data for the genes remaining after forward search is shownin Table 8.

Discovery Cohort Examinations

The final gene score was used to compute infection Z-scores in eachdiscovery dataset. The admission datasets were analyzed separately andseparate ROC plots plotted. For the hospital-acquired (Glue Grant)datasets, infection scores were standardized (converted into Z-scores)once for the whole cohort as opposed to normalizing the differenttime-bins separately to show changes over time in the same patients. Theinfection Z-scores were then analyzed for significance usingrepeated-measures analysis of variance. ROC curves were plotted for theindividual time-bins treated as separate datasets in the multi-cohortanalysis.

For the Glue Grant datasets two time-course analyses of infectionZ-score were carried out for both the buffy coat and neutrophildatasets. First, the average infection Z-score was compared over timeusing linear regression for patients within +/- 24 hours of infectionand for non-infected patients. Repeated-measures analysis of variancewas used to compare infected and non-infected groups to each other andto test for the significance of changes over time. Next, boxplots wereconstructed for each time window, such that the infection Z-score forthe patients in that time window who were never infected were comparedto patients at >5 days prior to their day of diagnosis with infection,5-1 days prior to diagnosis, or +/- 24 hours of diagnosis. For each timepoint (except for the 0-1 day window), the trend in infection Z-scoreacross the different groups was tested with the Jonckheere trend (JT)test. The infection Z-scores at the admission time point ([0,1)) weretested as the outcomes variable in multiple linear regression, examiningthe contributory effects of both injury severity score and time toinfection.

Validation

The final gene set was tested in several validation cohorts completelyseparate from the discovery cohorts. The sorted-cells cohort of the GlueGrant were broken into time-bins, and AUCs were calculated separatelyfor each time bin. Note that no infections with +/- 1 day of diagnosiswere captured in this cohort after 18 days post injury, so the [18,24)day bin is never shown.

The validation cohorts included three datasets that examined traumapatients over time (GSE6377, GSE12838, and EMEXP3001), all of whomdeveloped infections (mostly ventilator-associated pneumonia (VAP)).These datasets do not include controls, and so they were compared to theGlue Grant non-infected patients as a baseline. These three validationdatasets and the Glue Grant buffy coat non-infected samples were firstlinearly scaled by a factor of the geometric mean of four housekeepinggenes (GAPDH, ACTN1, RPL9, KARS) (Vandesompele et al. (2002) Genome Biol3:RESEARCH0034). The datasets were then joined on overlapping genes, andbatch-corrected between datasets using the ComBat empiric Bayesbatch-correction tool, with parametric priors (R Package sva) (Leek etal. (2012) Bioinformatics 28:882-883). The ComBat correction wascontrolled for day after injury (so that relative differences betweendays stay relatively different). The infection Z-score was thencalculated for the joined datasets, and the validation datasets wereplotted against the loess curve from the non-infected Glue Grant cohort.Patients within +/- 24 hours of their diagnosis of infection in thevalidation datasets were then compared to day-matchedComBat-co-normalized non-infected Glue Grant buffy coat patients, andROC curves were constructed.

All of other datasets found in the initial search that allow forcomparison between healthy or SIRS/trauma and sepsis patients were usedfor simple class discrimination validation. All datasets conducted onwhole blood or neutrophils are shown. Studies carried out in PBMCs wereselected for only those that examined SIRS/trauma and sepsis patients.Datasets using PBMC samples that did not include both a sterile SIRSgroup and a sepsis group were excluded. All peripheral blood healthy vssepsis patient datasets were grouped into a single violin plot andtested jointly for separation (Wilcoxon rank-sum) since they were allbeing used to make the same comparison. ROC curves were carried out oneach individual dataset separately to show the discriminatory capabilityof the infection Z-scores within each dataset.

Glue Grant SIRS Evaluation

To evaluate the effectiveness of SIRS as a screening criteria forinfection in the Glue Grant cohort, all patients were classified aseither non-infected, or within +/-24 hours of infection, with infectionas defined above. Patients were censored >24 hours after infectiondiagnosis. SIRS criteria were defined according to standardinternational guidelines (Temperature <36C or >38C, respiratory rate >20or PaCO2<32, total WBC <4,000 or >12,000, and HR >90). Patients missingany criteria were excluded. Each criterion was stored as a binaryvariable for each patient for each day. Logistic regression was run onthe data both with and without inclusion of the Z-score, and ROC AUC wascalculated for both models. The two models were then compared using thecontinuous net reclassification index (R package PredictABEL).

Gene Set Evaluation

The final gene set was evaluated for transcription factor binding sitesusing two online tools, EncodeQT (Auerbach et al. (2013) Bioinformatics29:1922- 1924) and PASTAA (Roider et al. (2009) Bioinformatics25:435-442). Positive and negative genes were evaluated separately,since they are hypothesized to be under separate regulatory control. TheEncodeQT tool was used with 5000 upstream and 5000 downstream base pairsfrom transcription start site. A similar analysis was carried out withPASTAA, examining the region -200 base pairs from transcription startsite, examining only those factors which were conserved for both mouseand human. The top ten significant transcription factors were recordedfor both analyses.

Cell-Type Enrichment Tests

GEO was searched for gene expression profiles of clinical samples ofrelevant immune cell types. The search was limited to only samples runon Affymetrix platforms, to ensure platform effect homogeneity. Alldatasets used were downloaded in RAW format and gcRMA normalizedseparately. For each sample, the mean of multiple probes mapping to thesame gene was taken as the gene value. Genes not present in all sampleswere thrown out. For multiple samples all corresponding to the same celltype, the mean of the samples was taken as the final value, thuscreating a single vector for each cell type. To obtain a Z-score for agene set in each cell type vector, the geometric mean of the ‘positive’genes’ expression is taken, and from it is subtract the geometric meanof the ‘negative’ genes’ expression, times the ratio of negative genesto positive genes (same procedure as for the infection Z-score). Thesescores are then standardized across all cell types, such that the scorerepresents the number of standard deviations away from the group mean.This thus represents how enriched a given gene set is in a given celltype, relative to other tested cell types.

A total of 18 GEO datasets that matched criteria were used: GSE3982(Jeffrey et al. (2006) Nat Immunol 7:274-283), GSE5099 (Martinez et al.(2006) J Immunol 177:7303-7311), GSE8668 (Radom-Aizik et al. (2008) JAppl Physiol 104:236-243), GSE11292 (He et al. (2012) Mol Syst Biol8:624), GSE12453 (Giefing et al. (2013) PLoS One 8:e84928), GSE13987(Meyers et al. (2009) J Immunol 182:5400-5411), GSE14879 (Eckerle et al.(2009) Leukemia 23:2129-2138), GSE15743 (Stegmann et al. (2010)Gastroenterology 138:1885-1897), GSE16020 (Vinh et al. (2010) Blood115:1519-1529), GSE16836 (Ancuta et al. (2009) BMC Genomics 10:403),GSE24759 (Novershtern et al. (2011) Cell 144:296-309), GSE28490(Allantaz et al. (2012) PLoS One 7:e29979), GSE28491 (Allantaz et al.,supra), GSE31773 (Tsitsiou et al. (2012) J Allergy Clin Immunol129:95-103), GSE34515 (Frankenberger et al. (2012) Eur J Immunol42:957-974), GSE38043 (Huen et al. (2013) Int J Cancer 133:373-382),GSE39889 (Malcolm et al. (2013) PLoS One 8:e57402), GSE42519 (Rapin etal. (2014) Blood 123:894-904), GSE49910 (Mabbott et al. (2013) BMCGenomics 14:632).

Two gene sets were tested in this manner: both the entire set of genesfound to be significant after the initial multi-cohort analysis, and thesubset of genes found to be most diagnostic after forward search. Theircorresponding figures show the Z-score (enrichment for the given geneset) in each cell subtype (black dots), as well as a box plot for theoverall distribution of Z-scores (any outliers shown as open circles).

Statistics and R

All computation and calculations were carried out in the R language forstatistical computing (version 3.0.2). Significance levels for p-valueswere set at 0.05, and analyses were two-tailed, unless specifiedotherwise.

TABLE 1 Publically available gene expression datasets comparingSIRS/ICU/trauma to sepsis/infections. CAP, community-acquired pneumonia.ARDS, acute respiratory distress syndrome Dataset Year Submitting AuthorPaper Reference Number Control Used Here Condition Used Here Sample TypeUsed Platform Timepoints Present in Dataset (Days) GPSSSI Unique2006-2011 Wong 27-32 Pediatric ICU-SI RS Sepsis and Septic Shock(bacterial infections only) Whole Blood GPL570 1-3 GSE28750 2011Sutherland 39 24h-post-‘major surgery’ community-acquired sepsis WholeBlood GPL570 Admission GSE32707 2012 Dolinay 43 MICU patients +/- SIRS,nonseptic Sepsis, Sepsis +ARDS Whole Blood GPL10558 0 & 7 GSE40012 2012Parnell 38 SIRS (66% Trauma) Sepsis from CAP (bacterial infections only)Whole Blood GPL6947 Days 1-5 for both SIRS and Sepsis Glue Grant -Trauma-Buffy Coat 2004-2006 Multiple 23-26 Trauma Trauma with infection BuffyCoat GPL570 0.5,1,4,7, 14,21,28

TABLE 2 All datasets used in the multi-cohort analysis. The numbersfollowing the Glue Grant cohort titles indicates days since infection inthe given cohort (for instance, [1,3) is patients from 1-3 days sinceinjury) Cohort SIRS/ Trauma Controls (n) Sepsis/ Infection Cases (n)Total (n) Admission Comparisons GSE28750 11 10 21 GSE32707 55 48 103GSE40012 24 41 65 GPSSSI Unique 30 189 219 Hospital-Acquired Comparisons(Glue Grant buffy coat cohorts) Glue Grant buffy coat [1,3) 65 9 74 GlueGrant buffy coat [3,6) 63 17 80 Glue Grant buffy coat [6,10) 50 15 65Glue Grant buffy coat [10,18) 22 4 26 Glue Grant buffy coat [18,24) 6 410 Total Used in Multi-cohort analysis 326 337 663

TABLE 3 The 11 gene set that separates SIRS/trauma from sepsis. Includedare meta-analysis effect sizes, errors, and heterogeneity analyses GeneSymbol Full Name Pooled Effect Size Pooled Effect Size Standard ErrorEffect Size P-Value Effect Size Q-Value Sum-of-Logs Q-Value Cochran’s 4Tau squared Inter-study heterogeneity p-Value CEACAM1 carcinoembryonicantigen-related cell adhesion molecule 1 0.778 0.073 1.9E-26 6.5E-227.2E-14 16.43 0.023 0.037 ZDHHC19 zinc finger, DHHC-type containing 191.083 0.130 6.6E-17 3.3E-13 7.1E-22 18.36 0.078 0.019 C9orf95nicotinamide riboside kinase 1 0.598 0.102 4.6E-09 1.4E-06 2.9E-12 12.640.032 0.125 GNA15 guanine nucleotide binding protein (G protein), alpha15 0.603 0.119 4.0E-07 4.3E-05 4.7E-08 10.66 0.030 0.222 BATF basicleucine zipper transcription factor, ATF-like 1.053 0.163 9.4E-116.2E-08 1.6E-19 17.49 0.115 0.025 C3AR1 complement component 3a receptor1 0.643 0.097 3.7E-11 3.0E-08 9.3E-08 3.99 0.000 0.858 KIAA1370 1 familywith sequence similarity 214, member A -0.664 0.148 7.8E-06 4.5E-042.1E-10 18.79 0.095 0.016 TGFBI transforming growth factor,beta-induced, 68 kDa -0.730 0.108 1.2E-11 1.1E-08 2.0E-10 9.12 0.0130.333 MTCH1 mitochondrial carrier 1 -0.686 0.135 4.1E-07 4.3E-05 7.9E-1013.04 0.058 0.111 RPGRIP1 retinitis pigmentosa GTPase regulatorinteracting protein 1 -0.694 0.156 8.5E-06 4.7E-04 5.2E-09 16.93 0.1030.031 HLA-DPB1 major histocompatibility complex, class II, DP beta 1-0.659 0.157 2.6E-05 1.1E-03 8.6E-09 17.38 0.107 0.026

TABLE 4 Dataset Year Submitting Author Paper Reference Number ControlUsed Here Condition Used Here Sample Type Used Platform Days SinceInjury N, control N,case Glue Grant Sorted Cells 2008-2011 MultipleTrauma patients without infection I Trauma patients +/-24 hours fromdiagnosis of infection Neutrophils, Monocytes, T-Cells GGH-1, GGH-2(GPL11320) [1,3) 56 10 [3,6) 55 10 [6,10) 46 14 [10,18) 24 3

TABLE 5 Publically available gene expression time-course datasets oftrauma patients that develop infections. VAP, ventilator-associatedpneumonia; VAT, ventilator-associated tracheobronchitis Dataset YearSubmitting Author Paper Reference Number Control Used Here ConditionUsed Here Sample Type Used Platform N, control N, case Samples Used HereTimepoints Present in Dataset GSE6377 2008 Cobb 47 None ICU patientswith eventual VAP Buffy Coat GPL201 0 11 99 1-21 days GSE12838 2008 CobbN/A None ICU patients with eventual VAP Buffy Coat, Neutrophils GPL570 04 60 1-11 days E-MEXP-3001 2011 Lopez 35 None ICU patients with acquiredVAP or VAT Whole Blood A-Agil-28 (GPL6480) 0 8 56 1-7 days

TABLE 6 Publically available gene expression datasets in whole blood orneutrophils comparing healthy patients to septic patients Dataset YearSubmitting : Author Paper Reference Number Control Used Here ConditionUsed Here Sample Type Used Platform N, control N, case TimepointsPresent in Dataset GSE11755 2008 Emonts 50 Healthy Children Children w/meningococcal sepsis Whole Blood GPL570 3 8 0, 0.33, 1, 3 GSE13015 2008Chaussabel 14 Healthy or Type 2 Diabetes Community- acquired ornosocomial sepsis Whole Blood GPL6106, GPL6947 20 63 Mixed, admission orsepsis onset GSE20346 2011 Parnell 49 Pre/Post-Flu Vaccine Bacterialpneumonia, Severe influenza Whole Blood GPL6947 36 45 5-Jan GSE218022011 Bermejo-Martin 34 Healthy Flu + ARDS Whole Blood GPL6102 4 12‘Early’ vs. ‘Late’ Sepsis GSE25504 2014 Smith Healthy neonates Neonateswith bacterial infection Whole Blood GPL570, GPL6947, GPL13667 44 44Infection onset GSE27131 2011 Berdal 44 Healthy Severe Flu A Whole BloodGPL6244 7 14 0-6 GSE33341 2011 Ahn 42 Healthy Sepsis from Bacterial CAPWhole Blood GPl571 43 51 Admission GSE40396 2013 Hu 37 Healthy Childrenafter surgery Children with infection + fever Whole Blood GPL10558 22 30Admission

TABLE 7 Publically available gene expression datasets comparing sterileSIRS/trauma/autoimmunity patients to later or non-time-matchedsepsis/infection patients Dataset Year Submitting Author Paper ReferenceNumber Control Used Here Condition Used Here Sample Type Used PlatformN, control N, case Timepoints Present in Dataset GSE5772 2007 Tang 41ICU non-sepsis (43% trauma) sepsis -early+late Neutrophils GPL4274 23 70Mixed, admission or sepsis onset GSE9960 2009 Tang 15 ICU non-sepsissepsis Monocytes GPL570 16 54 Mixed, admission or sepsis onset GSE220982010 Chaussabel Healthy Controls & Autoimmunity Infections afterdiagnosis Whole Blood GPL6947 Autoimmunity during systemic phase;Infections after diagnosis confirmation E-MEXP-3621 2012 Harokopos 54Trauma at Admission Trauma at Onset of Sepsis Monocytes s A-Affy-37(GPL571) 5 5 Admission vs. onset of Sepsis E-MTAB-1548 2014 Almansa 33Post-surgery (avg 2 days) sepsis after surgery (avg 4 days) Whole BloodA-MEXP-2183 (GPL10332) 34 Post-Surg 39 Post-Surgery (avg 2 days), Onsetof Sepsis (avg 4 days)

TABLE 8 Summary statistics for the 82 genes that passed significance,heterogeneity, and effect-size filtering after multi-cohort analysisgene symbol n studies summary se summary tau2 p value Q df pval het Pfdr F Stat up F Pval up F stat down F pval down F Qval up F Qval downADAMTS3 9 0.648269 0.143917 0.07797 6.65E-06 14.83131 8 0.06251 0.00039187.38284 4.25E-11 6.01495 0.996136 4.80E-09 1 ANKRD22 9 0.7043970.068995 0 1.80E-24 3.023063 8 0.932902 3.14E-20 100.4755 1.81E-131.08229 1 3.88E-11 1 ANXA3 9 0.718406 0.166894 0.12894 1.67E-05 19.213398 0.013759 0.00079 94.41522 2.31E-12 1.448082 1 3.71E-10 1 AP3B2 90.691257 0.125869 0.074155 3.98E-08 18.56155 8 0.017389 7.22E-0691.26694 8.55E-12 1.170627 1 1.17E-09 1 ARL8A 9 0.702502 0.1567950.106433 7.45E-06 17.26336 8 0.027481 0.00043 92.45977 5.21E-12 1.2033481 7.69E-10 1 B3GNT8 9 0.615648 0.131508 0.052581 2.85E-06 12.60337 80.126245 0.000201 84.85925 1.19E-10 4.280585 0.999614 1.21E-08 1 BATF 91.053446 0.162675 0.115021 9.43E-11 17.49358 8 0.025361 6.20E-08 154.5989.46E-24 0.293516 1 1.65E-19 1 BPI 9 0.648548 0.118551 0.029549 4.48E-0810.58983 8 0.22604 7.93E-06 86.28391 6.67E-11 2.613547 0.99999 7.26E-091 BST1 9 0.591544 0.118229 0.029239 5.63E-07 10.57953 8 0.22668 5.64E-0577.70947 2.15E-09 1.215268 1 1.53E-07 1 C1orf162 9 0.675288 0.1447560.078314 3.09E-06 14.83231 8 0.062489 0.000213 90.18378 1.34E-111.807791 1 1.71E-09 1 C3AR1 9 0.642508 0.097145 0 3.74E-11 3.985702 80.858411 2.97E-08 79.13497 1.21E-09 1.848442 0.999999 9.34E-08 1C9orf103 9 0.79588 0.146305 0.082027 5.33E-08 15.10186 8 0.0571959.02E-06 103.9087 4.24E-14 0.692697 1 1.07E-11 1 C9orf95 9 0.5984980.102092 0.032317 4.56E-09 12.63991 8 0.124855 1.37E-06 107.47879.26E-15 9.532782 0.946095 2.88E-12 1 CCR1 9 0.617951 0.114205 0.0527646.27E-08 15.60038 8 0.04847 1.01E-05 80.84841 6.09E-10 3.026479 0.999975.11E-08 1 CD177 9 0.859005 0.155738 0.103421 3.47E-08 18.17687 80.019939 6.51E-06 121.6406 2.06E-17 0.32395 1 1.93E-14 1 CD63 9 0.7157950.145235 0.07892 8.28E-07 14.84659 8 0.062198 7.87E-05 92.60416 4.91E-121.230922 1 7.31E-10 1 CD82 9 0.712222 0.153324 0.105659 3.40E-06 19.99328 0.010362 0.000229 108.0404 7.28E-15 1.689182 1 2.37E-12 1 CEACAM1 90.777702 0.073068 0.022592 1.87E-26 16.42612 8 0.036672 6.52E-22118.0288 9.87E-17 1.316686 1 7.17E-14 1 CLEC5A 9 0.742933 0.098047 03.53E-14 7.6215 8 0.471288 8.20E-11 102.1517 8.92E-14 0.355484 12.09E-11 1 DHRS9 9 0.588827 0.091062 0.039325 1.00E-10 19.65994 80.011702 6.49E-08 84.21748 1.55E-10 3.233439 0.999951 1.52E-08 1 EMR1 90.631226 0.137919 0.064062 4.72E-06 13.60366 8 0.092699 0.00029582.87146 2.68E-10 1.278954 1 2.46E-08 1 FAM89A 9 0.683704 0.0701090.001367 1.81E-22 8.215543 8 0.412704 2.10E-18 109.2253 4.39E-15 1.634191 1.56E-12 1 FCER1G 9 0.825093 0.113017 0.048372 2.86E-13 14.77229 80.063727 4.75E-10 112.3815 1.13E-15 0.608485 1 5.26E-13 1 FCGR1B 90.656424 0.083424 0 3.59E-15 6.900977 8 0.547353 1.26E-11 93.223923.79E-12 0.387371 1 5.79E-10 1 FES 9 0.619202 0.117056 0.027305 1.22E-0710.3985 8 0.238162 1.69E-05 83.81039 1.83E-10 2.271224 0.999997 1.76E-081 FFAR3 9 0.625449 0.135784 0.060488 4.10E-06 13.3059 8 0.1017490.000265 81.95591 3.89E-10 1.787622 1 3.43E-08 1 FIG4 9 0.6078790.117383 0.027801 2.24E-07 10.43819 8 0.235607 2.70E-05 81.2461 5.18E-102.111909 0.999998 4.44E-08 1 GNA15 9 0.603381 0.119077 0.030288 4.04E-0710.65968 8 0.221738 4.33E-05 81.07701 5.55E-10 1.382475 1 4.72E-08 1GPR84 9 0.892627 0.120993 0.030557 1.61E-13 10.57938 8 0.22669 2.81E-10125.5959 3.66E-18 0.122918 1 4.25E-15 1 HK3 9 0.736906 0.15041 0.0900039.62E-07 15.78283 8 0.045596 8.75E-05 98.86862 3.57E-13 1.492149 17.03E-11 1 HP 9 0.917673 0.138462 0.062412 3.41E-11 13.2873 8 0.1023392.77E-08 130.1515 4.96E-19 0.217584 1 9.10E-16 1 IL10 9 0.5859790.127675 0.057993 4.44E-06 14.7083 8 0.065072 0.00028 79.59046 1.01E-092.358741 0.999996 7.92E-08 1 IL18R1 9 0.589237 0.133191 0.0562569.69E-06 12.9634 8 0.113115 0.000531 77.97628 1.93E-09 2.039255 0.9999991.41E-07 1 KCNE1 9 0.644633 0.076133 0 2.51E-17 5.362592 8 0.7182111.51E-13 90.16343 1.35E-11 1.418853 1 1.72E-09 1 LCN2 9 0.70721 0.1351450.058578 1.67E-07 13.09133 8 0.108746 2.15E-05 95.83204 1.28E-121.497105 1 2.21E-10 1 LIN7A 9 0.691051 0.109274 0.042661 2.55E-1014.05438 8 0.080359 1.46E-07 102.0331 9.38E-14 1.583338 1 2.17E-11 1OSCAR 9 0.653819 0.124504 0.051008 1.51E-07 13.8118 8 0.086805 1.98E-0596.49838 9.66E-13 1.272787 1 1.70E-10 1 OSTalpha 9 0.867976 0.1367370.060402 2.18E-10 13.1769 8 0.105906 1.29E-07 117.464 1.26E-16 0.18605 18.79E-14 1 P2RX1 9 0.601662 0.097231 0 6.10E-10 7.903705 8 0.4429332.72E-07 77.33706 2.49E-09 2.562773 0.999992 1.72E-07 1 PADI2 9 0.6170070.104176 0.048808 3.17E-09 17.62447 8 0.024225 1.01E-06 97.581796.13E-13 3.64118 0.999881 1.15E-10 1 PECR 9 0.646105 0.122849 0.0687521.45E-07 17.8734 8 0.022196 1.92E-05 96.60666 9.23E-13 2.68429 0.9999881.63E-10 1 PLAC8 9 0.914226 0.133684 0.064878 7.99E-12 15.16198 80.056071 7.72E-09 129.0282 8.13E-19 0.166514 1 1.42E-15 1 PLB1 90.624304 0.101509 0.031745 7.74E-10 12.59051 8 0.126737 3.33E-0788.34826 2.85E-11 2.460701 0.999994 3.33E-09 1 PNPLA1 9 0.6619210.110402 0.02961 2.03E-09 11.37858 8 0.181157 7.00E-07 94.13861 2.59E-121.611789 1 4.14E-10 1 PPM1M 9 0.601672 0.14071 0.070244 1.90E-05 14.18458 0.077081 0.000875 76.72304 3.19E-09 1.483244 1 2.12E-07 1 PSTPIP2 90.601434 0.106478 0.011848 1.62E-08 9.046349 8 0.338401 3.51E-06 78.62511.49E-09 0.989724 1 1.13E-07 1 RETN 9 0.846118 0.138226 0.0616249.28E-10 13.23063 8 0.104157 3.76E-07 117.1262 1.46E-16 0.323468 19.79E-14 1 RGL4 9 0.727597 0.097757 0 9.85E-14 5.163271 8 0.7399911.91E-10 96.77518 8.60E-13 0.457406 1 1.54E-10 1 S100A12 9 0.8220320.154856 0.099381 1.11E-07 16.53376 8 0.035347 1.55E-05 110.58222.45E-15 1.153413 1 9.85E-13 1 SEPHS2 9 0.655956 0.152148 0.0951621.62E-05 16.32158 8 0.038002 0.000773 84.2591 1.52E-10 1.03841 11.50E-08 1 SETD8 9 0.620324 0.106869 0.052338 6.46E-09 18.25896 80.019367 1.83E-06 98.64859 3.91E-13 5.179065 0.998553 7.66E-11 1 SGSH 90.617383 0.121888 0.066408 4.08E-07 17.50436 8 0.025265 4.34E-0588.57382 2.60E-11 2.932124 0.999977 3.06E-09 1 SIGLEC9 9 0.7789950.150995 0.090956 2.48E-07 15.8002 8 0.045331 2.91E-05 104.9377 2.73E-140.544755 1 7.57E-12 1 SLC26A8 9 0.661594 0.127735 0.063947 2.23E-0718.16165 8 0.020047 2.69E-05 99.37806 2.88E-13 1.618236 1 5.77E-11 1SPPL2A 9 0.626134 0.136141 0.062284 4.24E-06 13.50619 8 0.09558 0.00027379.19584 1.18E-09 1.025124 1 9.15E-08 1 SQRDL 9 0.663981 0.123160.037073 7.00E-08 11.24886 8 0.187993 1.09E-05 86.94967 5.07E-110.811722 1 5.69E-09 1 TCN1 9 0.590797 0.104398 0.00911 1.52E-08 8.8058068 0.358942 3.40E-06 77.88568 2.00E-09 1.321941 1 1.45E-07 1 ZDHHC19 91.082518 0.129588 0.077549 6.63E-17 18.36049 8 0.01868 3.30E-13 168.22462.03E-26 1.078125 1 7.06E-22 1 ZDHHC3 9 0.605151 0.086784 0.0293993.10E-12 15.30001 8 0.053568 3.38E-09 120.0334 4.14E-17 4.85356 0.999073.13E-14 1 ARHGEF18 9 -0.70898 0.169784 0.138608 2.97E-05 20.06666 80.010087 0.001268 1.1909 1 90.30675 1.27E-11 1 1.83E-09 CACNA2D3 9-0.64833 0.109938 0.015972 3.70E-09 9.394217 8 0.310139 1.15E-061.439578 1 86.30028 6.62E-11 1 7.19E-09 CNNM3 9 -0.58818 0.1209690.040958 1.16E-06 12.19782 8 0.142593 0.000102 3.703486 0.99986578.88287 1.34E-09 1 8.64E-08 GLO1 9 -0.58606 0.097388 0 1.77E-096.320283 8 0.611403 6.42E-07 1.224842 1 75.96508 4.31E-09 1 2.26E-07GRAMD1C 9 -0.68641 0.110473 0.01627 5.19E-10 9.411211 8 0.3088022.44E-07 0.248515 1 92.54547 5.03E-12 1 8.80E-10 HACL1 9 -0.602330.126366 0.042383 1.87E-06 11.71196 8 0.164526 0.000148 1.289179 181.68863 4.33E-10 1 3.49E-08 HLA-DPB1 9 -0.65892 0.156786 0.107082.64E-05 17.38174 8 0.026371 0.001142 1.65281 1 85.77693 8.20E-11 18.56E-09 KIAA1370 9 -0.66377 0.148487 0.095204 7.81E-06 18.78508 80.016052 0.000446 1.651519 1 96.60836 9.22E-13 1 2.14E-10 KLHDC2 9-0.7076 0.098039 0 5.29E-13 7.407563 8 0.493364 8.39E-10 0.289342 197.63983 5.98E-13 1 1.53E-10 METAP1 9 -0.5853 0.097402 0 1.87E-097.655966 8 0.467777 6.70E-07 1.536127 1 76.52258 3.45E-09 1 1.89E-07MRP535 9 -0.62742 0.107996 0.013274 6.26E-09 9.160226 8 0.32896 1.79E-061.262215 1 84.65131 1.30E-10 1 1.28E-08 MTCH1 9 -0.68598 0.1354130.058253 4.07E-07 13.03704 8 0.110581 4.34E-05 2.017978 0.999999 92.84694.43E-12 1 7.94E-10 NOC3L 9 -0.59889 0.097342 0 7.63E-10 6.778439 80.560715 3.32E-07 1.720305 1 78.04294 1.88E-09 1 1.15E-07 ODC1 9-0.68104 0.127217 0.044613 8.63E-08 11.88186 8 0.156553 1.29E-053.023994 0.999971 95.2574 1.62E-12 1 3.47E-10 PRKRIR 9 -0.62305 0.1090610.015161 1.11E-08 9.332856 8 0.314999 2.62E-06 2.126397 0.999998 84.01651.68E-10 1 1.59E-08 RPGRIP1 9 -0.69388 0.155858 0.102714 8.51E-0616.92937 8 0.030853 0.000474 0.714397 1 87.23139 4.52E-11 1 5.20E-09RPUSD4 9 -0.59638 0.097168 0 8.37E-10 6.368051 8 0.606078 3.56E-071.250779 1 76.49836 3.49E-09 1 1.90E-07 SETD1B 9 -0.5977 0.1390340.068864 1.72E-05 14.10538 8 0.07906 0.000807 2.383355 0.999995 74.72577.05E-09 1 3.44E-07 TBC1D4 9 -0.60254 0.114465 0.052354 1.41E-0715.52911 8 0.049638 1.89E-05 1.277458 1 79.65881 9.83E-10 1 6.87E-08TGFBI 9 -0.72996 0.107749 0.012798 1.25E-11 9.115395 8 0.332655 1.05E-083.877609 0.999811 96.80662 8.48E-13 1 2.02E-10 TOMM20 9 -0.698770.118071 0.062035 3.25E-09 17.88411 8 0.022112 1.02E-06 2.9614330.999975 119.9027 4.38E-17 1 8.48E-14 UBE2Q2 9 -0.58758 0.0978790.000743 1.94E-09 8.065668 8 0.427082 6.88E-07 2.259608 0.999997 77.04022.81E-09 1 1.59E-07 WDR75 9 -0.60831 0.138795 0.066488 1.17E-05 13.798448 0.087173 0.00061 2.401615 0.999995 83.69362 1.92E-10 1 1.77E-08

TABLE 9 Linear models of infection score in the Glue Grant data. (A)Repeated-measures ANOVA of Glue Grant cohorts examining the effects oftime since injury and infection status on infection Z-score. (B) Linearregression of admission timepoint (Day 0-to-l since injury) infectionscore data on injury severity score and infection status. Significancelevels: P less than: 0.001 ‘***’ 0.01 ‘**’ 0.05 ‘*’ A. Repeated-measuresANOVA Buffy Coat - DISCOVERY SET DF Sum Sq Mean Sq F value P value Timesince injury 1 38.84 38.84 65.182 2.85E-14 *** Infection Status 1 59.8359.83 100.413 < 2e-16 *** Time:Infection status 1 0.97 0.97 1.635 0.202Residuals 251 149.56 0.6 Neutrophils - VALIDATION SET DF Sum Sq Mean SqF value P value Time since injury 1 3.33 3.33 4.822 2.92E-02 * InfectionStatus 1 32.32 32.32 46.743 8.32E-11 *** Time:Infection status 1 1.411.41 2.044 0.1543 Residuals 214 147.99 0.69 B. Linear Regression BuffyCoat — DISCOVERY SET Estimate Std Error T stat P value (Intercept)-0.281325 0.188184 -1.495 0.13639 Injury Severity Score (ISS) 0.0202290.006067 3.33 0.00101 ** Eventual Infection 0.913283 0.275058 3.320.00106 ** ISS:Eventual Infection -0.019907 0.008214 -2.423 0.0162 *Residual standard error: 0.7764 on 215 degrees of freedom F-statistic:7.484 on 3 and 215 DF, p-value: 8.673e-05 Neutrophils – VALIDATION SETEstimate Std Error T stat P value (Intercept) -0.740711 0.241641 -3.070.00253 ** Injury Severity Score (ISS) 0.029675 0.007585 3.91 0.000132*** Eventual Infection 1.1357 0.372217 3.051 0.002645 ** IS S: EventualInfection -0.030582 0.011067 -2.763 0.006353 ** Residual standard error:0.8582 on 170 degrees of freedom F-statistic: 6.19 on 3 and 170 DF,p-value: 0.0005129

TABLE 10 Comparison of infection Z-score across infection types. Shownare the infection classes present in the studied datasets for which n >20 within 1 day of infection diagnosis. Student’s t-tests were used forcomparisons, p <0.05 was considered significant Gram Positive versusGram Negative Study N, Gram Negative N, Gram Positive Gram Negative MeanScore Gram Positive Mean Score T Statistic DF P value Outcome GSE9960 1817 0.32 0.12 0.58 33.0 0.5672 Not different GSE13015-gpl6106 32 13 0.64-0.13 3.10 31.0 0.0041 Higher Gram Pos GSE33341 19 32 0.83 0.77 0.3442.4 0.7336 Not different GPSSSI Unique 56 87 0.26 0.61 -2.44 128.10.0162 Higher Gram Neg

Bacterial versus Viral Study N, Bacterial Infection N, Viral infectionBacterial infection Mean Score Viral Infection Mean Score T Statistic DFP value Outcome GSE20346 26 19 0.64 0.56 0.29 39.3 0.7770 Not differentGSE40012 74 25 0.48 0.52 -0.22 76.4 0.8230 Not different GSE40396 8 351.07 0.25 2.67 12.7 0.0194 Bacterial Higher GPSSSI Unique 143 16 0.470.04 1.74 17.9 0.0994 Not different

TABLE 11A Output from EncodeQT. The 6 positive and 5 negative genes wereanalyzed separately using default settings. Q-Values are derived fromBenjamini-Hochberg corrected hypergeometric test ENCODEQT- POSITIVEGENES +/- 5000 bp Factor Total Genes with Factor POSITIVE Observed GenesQ-value POSITIVE Factor Rank Max 14735 6 0.00E+00 1 ENCODEQT- NEGATIVEGENES +/- 5000 bp Factor Total Genes with Factor NEGATIVE Observed GenesQ-value NEGATIVE Factor Rank No Significant Transcription FactorInteractions Found (q < 0.05)

TABLE 11B Output from PASTAA. The 6 positive and 5 negative genes wereanalyzed separately using -200 basepairs from transcription start site,searching over conserved human/mouse sequences. P-values are from thehypergeometric test PASTAA - Positive Genes - 200 bp from TSS, conservedhuman/mouse Rank Matrix Transcription Factor Association Score P-Value 1ZBRK1_01 N/A 3.353 1.28E-03 2 PAX_Q.6 Pax-1, Pax-2 2.967 3.70E-03 3IRF_Q6_01 Irf-1, Irf-10 2.72 6.74E-03 4 CREL_01 C-rel 2.647 7.42E-03 5GATA4_Q.3 Gata-4 2.522 1.02E-02 6 PAX4_03 Pax-4a 2.522 1.02E-02 7PPAR_DR1_Q.2 Ppar-alpha, Ppar-beta 2.521 1.02E-02 8 STAT5A_04 Stat5a2.503 1.02E-02 9 PTF1BETA_Q6 N/A 2.372 1.43E-02 10 MYB_Q3 C-myb 2.3711.43E-02

PASTAA - Negative Genes - 200 bp from TSS, conserved human/mouse RankMatrix Transcription Factor Association Score P-Value 1 KAISO_01 N/A3.264 1.28E-03 2 PAX5_01 Pax-5 3.236 1.28E-03 3 TCF11_01 Lcr-f1 3.0661.78E-03 4 STRA13_01 Stra13 2.823 4.06E-03 5 HNF4ALPHA_Q6 Hnf-4,Hnf-4alpha 2.473 9.47E-03 6 ARNT_02 Arnt 2.346 1.31E-02 7 USF_Q6 Usf1,Usf2a 2.346 1.31E-02 8 PAX4_01 Pax-4a 2.221 1.70E-02 9 TFIII_Q6 Tfii-i2.22 1.70E-02 10 AP1_Q6_01 Fosb, Fra-1 2.204 1.70E-02

REFERENCES

1. D. C. Angus, W. T. Linde-Zwirble, J. Lidicker, G. Clermont, J.Carcillo, M. R. Pinsky, Epidemiology of severe sepsis in the UnitedStates: analysis of incidence, outcome, and associated costs of care.Crit Care Med 29, 1303-1310 (2001).

2. T. Lagu, M. B. Rothberg, M. S. Shieh, P. S. Pekow, J. S. Steingrub,P. K. Lindenauer, Hospitalizations, costs, and outcomes of severe sepsisin the United States 2003 to 2007. Crit Care Med 40, 754-761 (2012).

3. C. A. Torio, R. A. Andrews. (Agency for Healthcare Research andQuality, Rockville, MD, August 2013).

4. D. F. Gaieski, M. E. Mikkelsen, R. A. Band, J. M. Pines, R. Massone,F. F. Furia, F. S. Shofer, M. Goyal, Impact of time to antibiotics onsurvival in patients with severe sepsis or septic shock in whom earlygoal-directed therapy was initiated in the emergency department. CritCare Med 38, 1045-1053 (2010).

5. R. Ferrer, I. Martin-Loeches, G. Phillips, T. M. Osborn, S. Townsend,R. P. Dellinger, A. Artigas, C. Schorr, M. M. Levy, Empiric antibiotictreatment reduces mortality in severe sepsis and septic shock from thefirst hour: results from a guideline-based performance improvementprogram*. Crit Care Med 42, 1749-1755 (2014).

6. R. P. Dellinger, M. M. Levy, A. Rhodes, D. Annane, H. Gerlach, S. M.Opal, J. E. Sevransky, C. L. Sprung, I. S. Douglas, R. Jaeschke, T. M.Osborn, M. E. Nunnally, S. R. Townsend, K. Reinhart, R. M. Kleinpell, D.C. Angus, C. S. Deutschman, F. R. Machado, G. D. Rubenfeld, S. Webb, R.J. Beale, J. L. Vincent, R. Moreno, S. S. C. G. C. i. T. P. Subgroup,Surviving Sepsis Campaign: international guidelines for management ofsevere sepsis and septic shock, 2012. Intensive Care Med 39, 165-228(2013).

7. B. Coburn, A. M. Morris, G. Tomlinson, A. S. Detsky, Does this adultpatient with suspected bacteremia require blood cultures? JAMA 308,502-511 (2012).

8. B. M. Tang, G. D. Eslick, J. C. Craig, A. S. McLean, Accuracy ofprocalcitonin for sepsis diagnosis in critically ill patients:systematic review and meta-analysis. Lancet Infect Dis 7, 210-217(2007).

9. B. Uzzan, R. Cohen, P. Nicolas, M. Cucherat, G. Y. Perret,Procalcitonin as a diagnostic test for sepsis in critically ill adultsand after surgery or trauma: a systematic review and meta-analysis. CritCare Med 34, 1996-2003 (2006).

10. C. Cheval, J. F. Timsit, M. Garrouste-Orgeas, M. Assicot, B. DeJonghe, B. Misset, C. Bohuon, J. Carlet, Procalcitonin (PCT) is usefulin predicting the bacterial origin of an acute circulatory failure incritically ill patients. Intensive Care Med 26 Suppl 2, S153-158 (2000).

11. H. Ugarte, E. Silva, D. Mercan, A. De Mendonça, J. L. Vincent,Procalcitonin used as a marker of infection in the intensive care unit.Crit Care Med 27, 498-504 (1999).

12. J. P. Cobb, E. E. Moore, D. L. Hayden, J. P. Minei, J. Cuschieri, J.Yang, Q. Li, N. Lin, B. H. Brownstein, L. Hennessy, P. H. Mason, W. S.Schierding, D. J. Dixon, R. G. Tompkins, H. S. Warren, D. A. Schoenfeld,R. V. Maier, Validation of the riboleukogram to detectventilator-associated pneumonia after severe injury. Ann Surg 250,531-539 (2009).

13. W. Xiao, M. N. Mindrinos, J. Seok, J. Cuschieri, A. G. Cuenca, H.Gao, D. L. Hayden, L. Hennessy, E.E. Moore, J. P. Minei, P. E. Bankey,J. L. Johnson, J. Sperry, A. B. Nathens, T. R. Billiar, M. A. West, B.H. Brownstein, P. H. Mason, H. V. Baker, C. C. Finnerty, M. G. Jeschke,M. C. López, M. B. Klein, R. L. Gamelli, N. S. Gibran, B. Arnoldo, W.Xu, Y. Zhang, S. E. Calvano, G. P. McDonald-Smith, D. A. Schoenfeld, J.D. Storey, J. P. Cobb, H. S. Warren, L. L. Moldawer, D. N. Herndon, S.E.Lowry, R. V. Maier, R. W. Davis, R. G. Tompkins, I. a. H. R. t. I. L.-S.C. R. Program, A genomic storm in critically injured humans. J Exp Med208, 2581-2590 (2011).

14. R. Pankla, S. Buddhisa, M. Berry, D. M. Blankenship, G. J. Bancroft,J. Banchereau, G. Lertmemongkolchai, D. Chaussabel, Genomictranscriptional profiling identifies a candidate blood biomarkersignature for the diagnosis of septicemic melioidosis. Genome Biol 10,R127 (2009).

15. B. M. Tang, A. S. McLean, I. W. Dawes, S. J. Huang, R. C. Lin,Gene-expression profiling of peripheral blood mononuclear cells insepsis. Crit Care Med 37, 882-888 (2009).

16. H. R. Wong, Clinical review: sepsis and septic shock--the potentialof gene arrays. Crit Care 16, 204 (2012).

17. S. B. Johnson, M. Lissauer, G. V. Bochicchio, R. Moore, A. S. Cross,T. M. Scalea, Gene expression profiles differentiate between sterileSIRS and early sepsis. Ann Surg 245, 611- 621 (2007).

18. V. L. Vega, A marker for posttraumatic-sepsis: searching for theHoly Grail around intensive care units. Crit Care Med 37, 1806-1807(2009).

19. T. B. Geijtenbeek, S. I. Gringhuis, Signalling through C-type lectinreceptors: shaping immune responses. Nat Rev Immunol 9, 465-479 (2009).

20. P. R. Crocker, J. C. Paulson, A. Varki, Siglecs and their roles inthe immune system. Nat Rev Immunol 7, 255-266 (2007).

21. K. Kuespert, S. Pils, C. R. Hauck, CEACAMs: their role in physiologyand pathophysiology. Curr Opin Cell Biol 18, 565-571 (2006).

22. D. M. Maslove, H. R. Wong, Gene expression profiling in sepsis:timing, tissue, and translational considerations. Trends Mol Med,(2014).

23. J. P. Cobb, M. N. Mindrinos, C. Miller-Graziano, S. E. Calvano, H.V. Baker, W. Xiao, K. Laudanski, B. H. Brownstein, C. M. Elson, D. L.Hayden, D. N. Herndon, S. F. Lowry, R. V. Maier, D. A. Schoenfeld, L. L.Moldawer, R. W. Davis, R. G. Tompkins, P. Bankey, T. Billiar, D. Camp,I. Chaudry, B. Freeman, R. Gamelli, N. Gibran, B. Harbrecht, W. Heagy,D. Heimbach, J. Horton, J. Hunt, J. Lederer, J. Mannick, B. McKinley, J.Minei, E. Moore, F. Moore, R. Munford, A. Nathens, G. O′keefe, G.Purdue, L. Rahme, D. Remick, M. Sailors, M. Shapiro, G. Silver, R.Smith, G. Stephanopoulos, G. Stormo, M. Toner, S. Warren, M. West, S.Wolfe, V. Young, I. a. H. R. t. I. L.-S. C. R. Program, Application ofgenome-wide expression analysis to human health and disease. Proc NatlAcad Sci USA 102, 4801-4806 (2005).

24. J. Seok, H. S. Warren, A. G. Cuenca, M. N. Mindrinos, H. V. Baker,W. Xu, D. R. Richards, G. P. McDonald-Smith, H. Gao, L. Hennessy, C. C.Finnerty, C. M. López, S. Honari, E. E. Moore, J. P. Minei, J.Cuschieri, P. E. Bankey, J. L. Johnson, J. Sperry, A. B. Nathens, T. R.Billiar, M. A. West, M. G. Jeschke, M. B. Klein, R. L. Gamelli, N. S.Gibran, B. H. Brownstein, C. Miller-Graziano, S. E. Calvano, P. H.Mason, J. P. Cobb, L. G. Rahme, S. F. Lowry, R. V. Maier, L. L.Moldawer, D. N. Herndon, R. W. Davis, W. Xiao, R. G. Tompkins, L. r. S.C. R. P. Inflammation and Host Response to Injury, Genomic responses inmouse models poorly mimic human inflammatory diseases. Proc Natl AcadSci USA 110, 3507-3512 (2013).

25. K. H. Desai, C. S. Tan, J. T. Leek, R. V. Maier, R. G. Tompkins, J.D. Storey, I. a. t. H. R. t. I. L.-S. C. R. Program, Dissectinginflammatory complications in critically injured patients bywithin-patient gene expression changes: a longitudinal clinical genomicsstudy. PLoS Med 8, e1001093 (2011).

26. H. S. Warren, C. M. Elson, D. L. Hayden, D. A. Schoenfeld, J. P.Cobb, R. V. Maier, L. L. Moldawer, E. E. Moore, B. G. Harbrecht, K.Pelak, J. Cuschieri, D. N. Herndon, M. G. Jeschke, C. C. Finnerty, B. H.Brownstein, L. Hennessy, P. H. Mason, R. G. Tompkins, I. a. H. R. t. I.L. S. C. R. Program, A genomic score prognostic of outcome in traumapatients. Mol Med 15, 220-227 (2009).

27. N. Cvijanovich, T. P. Shanley, R. Lin, G. L. Allen, N. J. Thomas, P.Checchia, N. Anas, R. J. Freishtat, M. Monaco, K. Odoms, B. Sakthivel,H. R. Wong, G. o. P. S. S. S. Investigators, Validating the genomicsignature of pediatric septic shock. Physiol Genomics 34, 127-134(2008).

28. T. P. Shanley, N. Cvijanovich, R. Lin, G. L. Allen, N. J. Thomas, A.Doctor, M. Kalyanaraman, N. M. Tofil, S. Penfil, M. Monaco, K. Odoms, M.Barnes, B. Sakthivel, B. J. Aronow, H. R. Wong, Genome-levellongitudinal expression of signaling pathways and gene networks inpediatric septic shock. Mol Med 13, 495-508 (2007).

29. H. R. Wong, T. P. Shanley, B. Sakthivel, N. Cvijanovich, R. Lin, G.L. Allen, N. J. Thomas, A. Doctor, M. Kalyanaraman, N. M. Tofil, S.Penfil, M. Monaco, M. A. Tagavilla, K. Odoms, K. Dunsmore, M. Barnes, B.J. Aronow, G. o. P. S. S. S. Investigators, Genome-level expressionprofiles in pediatric septic shock indicate a role for altered zinchomeostasis in poor outcome. Physiol Genomics 30, 146-155 (2007).

30. H. R. Wong, N. Cvijanovich, G. L. Allen, R. Lin, N. Anas, K. Meyer,R. J. Freishtat, M. Monaco, K. Odoms, B. Sakthivel, T. P. Shanley, G. o.P. S. S. S. Investigators, Genomic expression profiling across thepediatric systemic inflammatory response syndrome, sepsis, and septicshock spectrum. Crit Care Med 37, 1558-1566 (2009).

31. H. R. Wong, R. J. Freishtat, M. Monaco, K. Odoms, T. P. Shanley,Leukocyte subset-derived genomewide expression profiles in pediatricseptic shock. Pediatr Crit Care Med 11, 349- 355 (2010).

32. H. R. Wong, N. Z. Cvijanovich, G. L. Allen, N. J. Thomas, R. J.Freishtat, N. Anas, K. Meyer, P. A. Checchia, R. Lin, T. P. Shanley, M.T. Bigham, D. S. Wheeler, L. A. Doughty, K. Tegtmeyer, S. E. Poynter, J.M. Kaplan, R. S. Chima, E. Stalets, R. K. Basu, B. M. Varisco, F. E.Barr, Validation of a gene expression-based subclassification strategyfor pediatric septic shock. Crit Care Med 39, 2511-2517 (2011).

33. R. Almansa, E. Tamayo, M. Heredia, S. Gutierrez, P. Ruiz, E.Alvarez, E. Gomez-Sanchez, D. Andaluz-Ojeda, R. Ceña, L. Rico, V.Iglesias, J. I. Gomez-Herreras, J. F. Bermejo-Martin, Transcriptomicevidence of impaired immunoglobulin G production in fatal septic shock.J Crit Care 29, 307-309 (2014).

34. J. F. Bermejo-Martin, I. Martin-Loeches, J. Rello, A. Antón, R.Almansa, L. Xu, G. Lopez-Campos, T. Pumarola, L. Ran, P. Ramirez, D.Banner, D. C. Ng, L. Socias, A. Loza, D. Andaluz, E. Maravi, M. J.Gómez-Sánchez, M. Gordón, M. C. Gallegos, V. Fernandez, S. Aldunate, C.León, P. Merino, J. Blanco, F. Martin-Sanchez, L. Rico, D. Varillas, V.Iglesias, M. Marcos, F. Gandía, F. Bobillo, B. Nogueira, S. Rojo, S.Resino, C. Castro, R. Ortiz de Lejarazu, D. Kelvin, Host adaptiveimmunity deficiency in severe pandemic influenza. Crit Care 14, R167(2010).

35. I. Martin-Loeches, E. Papiol, R. Almansa, G. López-Campos, J. F.Bermejo-Martin, J. Rello, Intubated patients developingtracheobronchitis or pneumonia have distinctive complement system geneexpression signatures in the pre-infection period: a pilot study. MedIntensiva 36, 257-263 (2012).

36. E. Tamayo, A. Fernández, R. Almansa, E. Carrasco, L. Goncalves, M.Heredia, D. AndaluzOjeda, G. March, L. Rico, J. I. Gomez-Herreras, R. O.de Lejarazu, J. F. Bermejo-Martin, Beneficial role of endogenousimmunoglobulin subclasses and isotypes in septic shock. J Crit Care 27,616-622 (2012).

37. X. Hu, J. Yu, S. D. Crosby, G. A. Storch, Gene expression profilesin febrile children with defined viral and bacterial infection. ProcNatl Acad Sci USA 110, 12792-12797 (2013).

38. G. P. Parnell, A. S. McLean, D. R. Booth, N. J. Armstrong, M. Nalos,S. J. Huang, J. Manak, W. Tang, O. Y. Tam, S. Chan, B. M. Tang, Adistinct influenza infection signature in the blood transcriptome ofpatients with severe community-acquired pneumonia. Crit Care 16, R157(2012).

39. A. Sutherland, M. Thomas, R. A. Brandon, R. B. Brandon, J. Lipman,B. Tang, A. McLean, R. Pascoe, G. Price, T. Nguyen, G. Stone, D. Venter,Development and validation of a novel molecular biomarker diagnostictest for the early detection of sepsis. Crit Care 15, R149 (2011).

40. Y. Tang, H. Xu, X. Du, L. Lit, W. Walker, A. Lu, R. Ran, J. P.Gregg, M. Reilly, A. Pancioli, J. C. Khoury, L. R. Sauerbeck, J. A.Carrozzella, J. Spilker, J. Clark, K. R. Wagner, E. C. Jauch, D. J.Chang, P. Verro, J. P. Broderick, F. R. Sharp, Gene expression in bloodchanges rapidly in neutrophils and monocytes after ischemic stroke inhumans: a microarray study. J Cereb Blood Flow Metab 26, 1089-1102(2006).

41. B. M. Tang, A. S. McLean, I. W. Dawes, S. J. Huang, R. C. Lin, Theuse of gene-expression profiling to identify candidate genes in humansepsis. Am J Respir Crit Care Med 176, 676- 684 (2007).

42. S. H. Ahn, E. L. Tsalik, D. D. Cyr, Y. Zhang, J. C. van Velkinburgh,R. J. Langley, S. W. Glickman, C. B. Cairns, A. K. Zaas, E. P. Rivers,R. M. Otero, T. Veldman, S. F. Kingsmore, J. Lucas, C. W. Woods, G. S.Ginsburg, V. G. Fowler, Gene expression-based classifiers identifyStaphylococcus aureus infection in mice and humans. PLoS One 8, e48979(2013).

43. T. Dolinay, Y. S. Kim, J. Howrylak, G. M. Hunninghake, C. H. An, L.Fredenburgh, A. F. Massaro, A. Rogers, L. Gazourian, K. Nakahira, J. A.Haspel, R. Landazury, S. Eppanapally, J. D. Christie, N. J. Meyer, L. B.Ware, D. C. Christiani, S. W. Ryter, R. M. Baron, A. M. Choi,Inflammasome-regulated cytokines are critical mediators of acute lunginjury. Am J Respir Crit Care Med 185, 1225-1234 (2012).

44. J. E. Berdal, T. E. Mollnes, T. Wæhre, O. K. Olstad, B. Halvorsen,T. Ueland, J. H. Laake, M. T. Furuseth, A. Maagaard, H. Kjekshus, P.Aukrust, C. M. Jonassen, Excessive innate immune response and mutantD222G/N in severe A (H1N1) pandemic influenza. J Infect 63, 308-316(2011).

45. M. P. Berry, C. M. Graham, F. W. McNab, Z. Xu, S. A. Bloch, T. Oni,K. A. Wilkinson, R. Banchereau, J. Skinner, R. J. Wilkinson, C. Quinn,D. Blankenship, R. Dhawan, J. J. Cush, A. Mejias, O. Ramilo, O. M. Kon,V. Pascual, J. Banchereau, D. Chaussabel, A. O′Garra, Aninterferon-inducible neutrophil-driven blood transcriptional signaturein human tuberculosis. Nature 466, 973-977 (2010).

46. K. Fredriksson, I. Tjäder, P. Keller, N. Petrovic, B. Ahlman, C.Scheele, J. Wemerman, J. A. Timmons, O. Rooyackers, Dysregulation ofmitochondrial dynamics and the muscle transcriptome in ICU patientssuffering from sepsis induced multiple organ failure. PLoS One 3, e3686(2008).

47. J. E. McDunn, K. D. Husain, A. D. Polpitiya, A. Burykin, J. Ruan, Q.Li, W. Schierding, N. Lin, D. Dixon, W. Zhang, C. M. Coopersmith, W. M.Dunne, M. Colonna, B. K. Ghosh, J. P. Cobb, Plasticity of the systemicinflammatory response to acute infection during critical illness:development of the riboleukogram. PLoS One 3, e1564 (2008).

48. T. P. Chung, J. M. Laramie, D. J. Meyer, T. Downey, L. H. Tam, H.Ding, T. G. Buchman, I. Karl, G.D. Stormo, R. S. Hotchkiss, J. P. Cobb,Molecular diagnostics in sepsis: from bedside to bench. J Am Coll Surg203, 585-598 (2006).

49. G. Parnell, A. McLean, D. Booth, S. Huang, M. Nalos, B. Tang,Aberrant cell cycle and apoptotic changes characterise severe influenzaA infection--a meta-analysis of genomic signatures in circulatingleukocytes. PLoS One 6, el7186 (2011).

50. M. Emonts, Ph.D. thesis, Erasmus University Rotterdam, (2008).

51. P. Khatri, S. Roedder, N. Kimura, K. De Vusser, A. A. Morgan, Y.Gong, M. P. Fischbein, R. C. Robbins, M. Naesens, A. J. Butte, M. M.Sarwal, A common rejection module (CRM) for acute rejection acrossmultiple organs identifies novel therapeutics for organ transplantation.J Exp Med 210, 2205-2221 (2013).

52. T. M. Osborn, J. K. Tracy, J. R. Dunne, M. Pasquale, L. M.Napolitano, Epidemiology of sepsis in patients with traumatic injury.Crit Care Med 32, 2234-2240 (2004).

53. M. J. Pencina, R. B. D′Agostino, O. V. Demler, Novel metrics forevaluating improvement in discrimination: net reclassification andintegrated discrimination improvement for normal variables and nestedmodels. Stat Med 31, 101-113 (2012).

54. C. L. Smith, P. Dickinson, T. Forster, M. Craigon, A. Ross, M. R.Khondoker, R. France, A. Ivens, D. J. Lynn, J. Orme, A. Jackson, P.Lacaze, K. L. Flanagan, B. J. Stenson, P. Ghazal, Identification of ahuman neonatal immune-metabolic network associated with bacterialinfection. Nat Commun 5, 4649 (2014).

55. A. G. Vassiliou, N. A. Maniatis, S. E. Orfanos, Z. Mastora, E.Jahaj, T. Paparountas, A. Armaganidis, C. Roussos, V. Aidinis, A.Kotanidou, Induced expression and functional effects of aquaporin-1 inhuman leukocytes in sepsis. Crit Care 17, R199 (2013).

56. F. Allantaz, D. Chaussabel, D. Stichweh, L. Bennett, W. Allman, A.Mejias, M. Ardura, W. Chung, E. Smith, C. Wise, K. Palucka, O. Ramilo,M. Punaro, J. Banchereau, V. Pascual, Blood leukocyte microarrays todiagnose systemic onset juvenile idiopathic arthritis and follow theresponse to IL-1 blockade. J Exp Med 204, 2131-2144 (2007).

57. K. Newton, V. M. Dixit, Signaling in innate immunity andinflammation. Cold Spring Harb Perspect Biol 4, (2012).

58. R. Chen, P. Khatri, P. K. Mazur, M. Polin, Y. Zheng, D. Vaka, C. D.Hoang, J. Shrager, Y. Xu, S. Vicent, A. J. Butte, E. A. Sweet-Cordero, Ameta-analysis of lung cancer gene expression identifies PTK7 as asurvival gene in lung adenocarcinoma. Cancer Res 74, 2892-2902 (2014).

59. F. Hietbrink, L. Koenderman, M. Althuizen, J. Pillay, V. Kamp, L. P.Leenen, Kinetics of the innate immune response after trauma:implications for the development of late onset sepsis. Shock 40, 21-27(2013).

60. S. A. Madsen-Bouterse, R. Romero, A. L. Tarca, J. P. Kusanovic, J.Espinoza, C. J. Kim, J. S. Kim, S. S. Edwin, R. Gomez, S. Draghici, Thetranscriptome of the fetal inflammatory response syndrome. Am J ReprodImmunol 63, 73-92 (2010).

61. H. R. Wong, N. Z. Cvijanovich, M. Hall, G. L. Allen, N. J. Thomas,R. J. Freishtat, N. Anas, K. Meyer, P. A. Checchia, R. Lin, M. T.Bigham, A. Sen, J. Nowak, M. Quasney, J. W. Henricksen, A. Chopra, S.Banschbach, E. Beckman, K. Harmon, P. Lahni, T. P. Shanley,Interleukin-27 is a novel candidate diagnostic biomarker for bacterialinfection in critically ill children. Crit Care 16, R213 (2012).

62. A. Kwan, M. Hubank, A. Rashid, N. Klein, M. J. Peters,Transcriptional instability during evolving sepsis may limit biomarkerbased risk stratification. PLoS One 8, e60501 (2013).

63. R. Cavallazzi, C. L. Bennin, A. Hirani, C. Gilbert, P. E. Marik, Isthe band count useful in the diagnosis of infection? An accuracy studyin critically ill patients. J Intensive Care Med 25, 353-357 (2010).

64. G. Drifte, I. Dunn-Siegrist, P. Tissieres, J. Pugin, Innate immunefunctions of immature neutrophils in patients with sepsis and severesystemic inflammatory response syndrome. Crit Care Med 41, 820-832(2013).

65. P. J. Cornbleet, Clinical utility of the band count. Clin Lab Med22, 101-136 (2002).

66. W. van der Meer, W. van Gelder, R. de Keijzer, H. Willems, Does theband cell survive the 21st century? Eur J Haematol 76, 251-254 (2006).

67. K. Saito, T. Wagatsuma, H. Toyama, Y. Ejima, K. Hoshi, M. Shibusawa,M. Kato, S. Kurosawa, Sepsis is characterized by the increases inpercentages of circulating CD4+CD25+ regulatory T cells and plasmalevels of soluble CD25. Tohoku J Exp Med 216, 61-68 (2008).

68. F. Venet, C. S. Chung, G. Monneret, X. Huang, B. Horner, M. Garber,A. Ayala, Regulatory T cell populations in sepsis and trauma. J LeukocBiol 83, 523-535 (2008).

69. D. Grimaldi, S. Louis, F. Pène, G. Sirgo, C. Rousseau, Y. E.Claessens, L. Vimeux, A. Cariou, J. P. Mira, A. Hosmalin, J. D. Chiche,Profound and persistent decrease of circulating dendritic cells isassociated with ICU-acquired infection in patients with septic shock.Intensive Care Med 37, 1438-1446 (2011).

70. S. Park, Y. Zhang, S. Lin, T. H. Wang, S. Yang, Advances inmicrofluidic PCR for point-of-care infectious disease diagnostics.Biotechnol Adv 29, 830-839 (2011).

71. M. A. Poritz, A. J. Blaschke, C. L. Byington, L. Meyers, K. Nilsson,D. E. Jones, S. A. Thatcher, T. Robbins, B. Lingenfelter, E. Amiott, A.Herbener, J. Daly, S. F. Dobrowolski, D. H. Teng, K. M. Ririe,FilmArray, an automated nested multiplex PCR system for multi-pathogendetection: development and application to respiratory tract infection.PLoS One 6, e26047 (2011).

72. G. Smyth, in Bioinformatics and Computational Biology SolutionsUsing R and Bioconductor, C.V. Gentleman R, Dudoit S, Irizarry R andHuber W (eds.), Ed. (Springer, New York, 2005), pp. pp. 397-420.

73. W. Xu, J. Seok, M. N. Mindrinos, A. C. Schweitzer, H. Jiang, J.Wilhelmy, T. A. Clark, K. Kapur, Y. Xing, M. Faham, J. D. Storey, L. L.Moldawer, R. V. Maier, R. G. Tompkins, W. H. Wong, R. W. Davis, W. Xiao,I. a. H. R. t. I. L.-S. C. R. Program, Human transcriptome array forhigh-throughput clinical studies. Proc Natl Acad Sci USA 108, 3707-3712(2011).

74. Y. Koren, L. Carmel, Robust linear dimensionality reduction. IEEETrans Vis Comput Graph 10, 459-470 (2004).

75. A. Bodor, I. Csabai, M. W. Mahoney, N. Solymosi, rCUR: an R packagefor CUR matrix decomposition. BMC Bioinformatics 13, 103 (2012).

76. J. Vandesompele, K. De Preter, F. Pattyn, B. Poppe, N. Van Roy, A.De Paepe, F. Speleman, Accurate normalization of real-time quantitativeRT-PCR data by geometric averaging of multiple internal control genes.Genome Biol 3, RESEARCH0034 (2002).

77. J. T. Leek, W. E. Johnson, H. S. Parker, A. E. Jaffe, J. D. Storey,The sva package for removing batch effects and other unwanted variationin high-throughput experiments. Bioinformatics 28, 882-883 (2012).

78. R. K. Auerbach, B. Chen, A. J. Butte, Relating genes to function:identifying enriched transcription factors using the ENCODE ChIP-Seqsignificance tool. Bioinformatics 29, 1922- 1924 (2013).

79. H. G. Roider, T. Manke, S. O′Keeffe, M. Vingron, S. A. Haas, PASTAA:identifying transcription factors associated with sets of co-regulatedgenes. Bioinformatics 25, 435-442 (2009).

80. K. L. Jeffrey, T. Brummer, M. S. Rolph, S. M. Liu, N. A. Callejas,R. J. Grumont, C. Gillieron, F. Mackay, S. Grey, M. Camps, C. Rommel, S.D. Gerondakis, C. R. Mackay, Positive regulation of immune cell functionand inflammatory responses by phosphatase PAC-1. Nat Immunol 7, 274-283(2006).

81. F. O. Martinez, S. Gordon, M. Locati, A. Mantovani, Transcriptionalprofiling of the human monocyte-to-macrophage differentiation andpolarization: new molecules and patterns of gene expression. J lmmunol177, 7303-7311 (2006).

82. S. Radom-Aizik, F. Zaldivar, S. Y. Leu, P. Galassetti, D. M. Cooper,Effects of 30 min of aerobic exercise on gene expression in humanneutrophils. J Appl Physiol (1985) 104, 236-243 (2008).

83. F. He, H. Chen, M. Probst-Kepper, R. Geffers, S. Eifes, A. Del Sol,K. Schughart, A. P. Zeng, R. Balling, PLAU inferred from a correlationnetwork is critical for suppressor function of regulatory T cells. MolSyst Biol 8, 624 (2012).

84. M. Giefing, S. Winoto-Morbach, J. Sosna, C. Döring, W. Klapper, R.Küppers, S. Böttcher, D. Adam, R. Siebert, S. Schütze,Hodgkin-Reed-Sternberg cells in classical Hodgkin lymphoma showalterations of genes encoding the NADPH oxidase complex and impairedreactive oxygen species synthesis capacity. PLoS One 8, e84928 (2013).

85. J. A. Meyers, D. W. Su, A. Lerner, Chronic lymphocytic leukemia andB and T cells differ in their response to cyclic nucleotidephosphodiesterase inhibitors. J Immunol 182, 5400-5411 (2009).

86. S. Eckerle, V. Brune, C. Döring, E. Tiacci, V. Bohle, C. Sundström,R. Kodet, M. Paulli, B. Falini, W. Klapper, A. B. Chaubert, K.Willenbrock, D. Metzler, A. Bräuninger, R. Küppers, M. L. Hansmann, Geneexpression profiling of isolated tumour cells from anaplastic large celllymphomas: insights into its cellular origin, pathogenesis and relationto Hodgkin lymphoma. Leukemia 23, 2129-2138 (2009).

87. K. A. Stegmann, N. K. Björkström, H. Veber, S. Ciesek, P. Riese, J.Wiegand, J. Hadem, P. V. Suneetha, J. Jaroszewicz, C. Wang, V.Schlaphoff, P. Fytili, M. Comberg, M. P. Manns, R. Geffers, T.Pietschmann, C. A. Guzman, H. G. Ljunggren, H. Wedemeyer,Interferon-alpha induced TRAIL on natural killer cells is associatedwith control of hepatitis C virus infection. Gastroenterology 138,1885-1897 (2010).

88. D. C. Vinh, S. Y. Patel, G. Uzel, V. L. Anderson, A. F. Freeman, K.N. Olivier, C. Spalding, S. Hughes, S. Pittaluga, M. Raffeld, L. R.Sorbara, H. Z. Elloumi, D. B. Kuhns, M. L. Turner, E. W. Cowen, D. Fink,D. Long-Priel, A. P. Hsu, L. Ding, M. L. Paulson, A. R. Whitney, E. P.Sampaio, D. M. Frucht, F. R. DeLeo, S. M. Holland, Autosomal dominantand sporadic monocytopenia with susceptibility to mycobacteria, fungi,papillomaviruses, and myelodysplasia. Blood 115, 1519-1529 (2010).

89. P. Ancuta, K. Y. Liu, V. Misra, V. S. Wacleche, A. Gosselin, X.Zhou, D. Gabuzda, Transcriptional profiling reveals developmentalrelationship and distinct biological functions of CD16+ and CD16-monocyte subsets. BMC Genomics 10, 403 (2009).

90. N. Novershtern, A. Subramanian, L. N. Lawton, R. H. Mak, W. N.Haining, M. E. McConkey, N. Habib, N. Yosef, C. Y. Chang, T. Shay, G. M.Frampton, A. C. Drake, I. Leskov, B. Nilsson, F. Preffer, D. Dombkowski,J. W. Evans, T. Liefeld, J. S. Smutko, J. Chen, N. Friedman, R. A.Young, T. R. Golub, A. Regev, B. L. Ebert, Densely interconnectedtranscriptional circuits control cell states in human hematopoiesis.Cell 144, 296-309 (2011).

91. F. Allantaz, D. T. Cheng, T. Bergauer, P. Ravindran, M. F. Rossier,M. Ebeling, L. Badi, B. Reis, H. Bitter, M. D′Asaro, A. Chiappe, S.Sridhar, G. D. Pacheco, M. E. Burczynski, D. Hochstrasser, J.Vonderscher, T. Matthes, Expression profiling of human immune cellsubsets identifies miRNA-mRNA regulatory relationships correlated withcell type specific expression. PLoS One 7, e29979 (2012).

92. E. Tsitsiou, A. E. Williams, S. A. Moschos, K. Patel, C. Rossios, X.Jiang, O. D. Adams, P. Macedo, R. Booton, D. Gibeon, K. F. Chung, M. A.Lindsay, Transcriptome analysis shows activation of circulating CD8+ Tcells in patients with severe asthma. J Allergy Clin Immunol 129, 95-103(2012).

93. M. Frankenberger, T. P. Hofer, A. Marei, F. Dayyani, S. Schewe, C.Strasser, A. Aldraihim, F. Stanzel, R. Lang, R. Hoffmann, O. Prazeres daCosta, T. Buch, L. Ziegler-Heitbrock, Transcript profiling of CD16-positive monocytes reveals a unique molecular fingerprint. Eur JImmunol 42, 957-974 (2012).

94. N. Y. Huen, A. L. Pang, J. A. Tucker, T. L. Lee, M. Vergati, C.Jochems, C. Intrivici, V. Cereda, W. Y. Chan, O. M. Rennert, R. A.Madan, J. L. Gulley, J. Schlom, K. Y. Tsang, Up-regulation ofproliferative and migratory genes in regulatory T cells from patientswith metastatic castration-resistant prostate cancer. Int J Cancer 133,373-382 (2013).

95. K. C. Malcolm, E. M. Nichols, S. M. Caceres, J. E. Kret, S. L.Martiniano, S. D. Sagel, E. D. Chan, L. Caverly, G. M. Solomon, P.Reynolds, D. L. Bratton, J. L. Taylor-Cousar, D. P. Nichols, M. T.Saavedra, J. A. Nick, Mycobacterium abscessus induces a limited patternof neutrophil activation that promotes pathogen survival. PLoS One 8,e57402 (2013).

96. N. Rapin, F. O. Bagger, J. Jendholm, H. Mora-Jensen, A. Krogh, A.Kohlmann, C. Thiede, N. Borregaard, L. Bullinger, O. Winther, K.Theilgaard-Mönch, B. T. Porse, Comparing cancer vs normal geneexpression profiles identifies new disease entities and commontranscriptional programs in AML patients. Blood 123, 894-904 (2014).

97. N. A. Mabbott, J. K. Baillie, H. Brown, T. C. Freeman, D. A. Hume,An expression atlas of human primary cells: inference of gene functionfrom coexpression networks. BMC Genomics 14, 632 (2013).

Example 2 Benchmarking Sepsis Gene Expression Diagnostics Using PublicData

There is no rapidly available gold-standard molecular test that candetermine whether a patient with systemic inflammation has an underlyinginfection. Missed diagnoses of sepsis leads to late treatment andincreased mortality, while inappropriate antibiotics increase antibioticresistance and can lead to complications (Ferrer et al. (2014) Crit CareMed 42(8): 1749-1755; McFarland (2008) Future Microbiol 3(5):563-578;Gaieski et al. (2010) Crit Care Med 38(4):1045-1053). There is thus anurgent and unmet need for new diagnostics that can separate patientswith non-infectious inflammation from patients with sepsis (Cohen et al.(2015) Lancet Infect Dis 15(5):581-614).

New diagnostics that can distinguish sepsis from non-infectiousinflammation are difficult to derive, as many of the cellular pathwaysthat are activated in response to infections are also activated inresponse to tissue trauma and non-infectious inflammation. Thus,high-throughput ‘omics’ technologies, such as gene expression profilingvia microarray, are a good way to study sepsis, as they allow for thesimultaneous examination of tens of thousands of genes. Statisticaltechniques can then be used to derive new classifiers that are optimizedfor diagnosis. However, high-throughput datasets always have morevariables than samples, and so are prone to non-reproducible, overfitresults (Shi et al. (2008) BMC Bioinformatics 9 Suppl 9:S10; Ioannidiset al. (2001) Nat Genet 29(3):306-309). Moreover, in an effort toincrease statistical power, each biomarker dataset is performed in aclinically homogeneous cohort, and often performed using only a singletype of microarray. Although this design does result in a greater powerto detect differences between the groups being studied, the results areless likely to remain true in different clinical cohorts using differentlaboratory techniques. As a result, independent validation is absolutelynecessary to gauge the generalizability of any new classifier derivedfrom high-throughput studies.

To the best of our knowledge, there are three gene expressiondiagnostics that have been developed using microarray data specificallyto separate patients with sepsis from those with non-infectiousinflammation. These are an 11-gene set hereafter referred to as the‘Sepsis MetaScore’ (SMS) ( Sweeney et al. (2015) Sci Transl Med7(287):287ra271), the FAIM3:PLAC8 ratio (Scicluna et al. (2015) Am JRespir Crit Care Med. 192(7):826-835), and the Septicyte Lab (McHugh etal. (2015) PLoS Med 12(12):e1001916). In addition, there are now dozensof publicly available datasets examining patients with sepsis or acuteinfections. They span a very broad range of clinical conditions,including different age groups, infection types, comorbid conditions,and control (non-infectious) conditions. This relatively untapped publicresource can thus be used to estimate the relative strengths andweaknesses of different diagnostics across an enormous number of patientsamples. Here we used all available public gene expression data to studyand directly compare the diagnostic power of the three sepsis geneexpression diagnostics.

Methods

We completed a systematic search on Dec. 10, 2015 of two public geneexpression repositories (NIH GEO and EBI ArrayExpress) using thefollowing terms: sepsis, SIRS, pneumonia, trauma, ICU, infection, acute,shock, and surgery. We automatically excluded non-microarray, non-humandata. Then, using the abstracts of the corresponding manuscripts forscreening, we eliminated (1) non-clinical (2) non-time-matched, and (3)non-whole-blood datasets. The remaining datasets were then sortedaccording to whether the reference group (compared to sepsis) washealthy controls or non-infected SIRS patients. A schematic is shown inFIG. 17 . In addition to the data from the systematic search, weincluded the two longitudinal trauma cohorts from the Inflammation andHost Response to Injury (Glue Grant) cohorts, as described previously(Sweeney et al. (2015) Sci Transl Med 7(287):287ra2717). Cohorts fromthe same study run on different microarrays were treated as independent.

All datasets for which raw data was available were renormalized ifpossible. Affymetrix arrays were renormalized using gcRMA (on arrayswith perfect-match probes available) or RMA. Illumina, Agilent, GE, andother commercial arrays were renormalized via normal-exponentialbackground correction followed by quantile normalization. Custom arrayswere not renormalized. All data were used in log2-transformed state.Probes were summarized to genes within datasets using a fixed-effectsmodel (Ramasamy et al. (2008) PLoS Med 5(9):e184).

A literature review was conducted to search for gene expressionsignatures specifically optimized for diagnosis of sepsis as compared tonon-infected patients. The resulting models were then tested fordiagnostic power of sepsis as measured by the area under the receiveroperating characteristic curves (AUC). Datasets for which any of thesepsis scores could not be calculated (either all up-regulated or alldown-regulated genes were missing) were excluded from final results. Fora given comparison (e.g. non-infectious SIRS versus sepsis atadmission), means were calculated both for all datasets of that type, aswell as for only non-discovery datasets, since discovery datasets areexpected to show an overestimate of diagnostic power compared toindependent validation datasets. Finally, we compared the overlappingvalidation sets for each diagnostic score with paired t-tests (e.g., the11-gene set and the FAIM3:PLAC8 ratio were compared in their ability todiagnose sepsis in GSE74227, E-MEXP-3589, and the Glue Grantneutrophils, as these were the only cohorts that were validated for bothdatasets).

The patient samples in GSE28750 (11) (N=21) were also used in the laterdataset GSE74224 (McHugh et al. (2015) PLoS Med 12(12):e1001916)(N=105), though the two datasets were run using different microarraytypes (Affymetrix HG 2.0 vs. Affymetrix Exon 1.0 ST). As a result,GSE28750 is not included in the validation calculation for the SepticyteLab (discovered in GSE74224), while in computing the validation mean forthe Sepsis MetaScore, the AUC in GSE74224 was penalized to account forthe fact that 20% of the GSE74224 patients were present in discovery(penalized AUC * 0.8 + actual AUC in GSE28750 * 0.2) = actual AUC.

To test confounding by infection type, each dataset was screened forpresence of either (1) both Gram positive and Gram negative infections,or (2) both bacterial and viral infections. Microbiology determinationsas described by the original authors of the data were assumed to becorrect. Cases of co-infection were not included in the confoundingcomparisons. To test confounding, each gene expression score wascalculated for datasets which included both classes of interest, andresulting scores between classes were compared via Wilcoxon rank-sumtest within the dataset.

Meta-analysis was performed as previously described. Briefly,differential gene expression between non-infectious SIRS and sepsispatients was summarized within datasets using Hedge’s g, and thencompared between datasets using a DerSimonian-Laird random effectsmodel, followed by Benjamini-Hochberg correction. Forest plots forindividual genes show individual effects within each dataset, as well asthe summarized ‘meta-effect’, in log2 space. All analyses were performedusing the R statistical computing language. Significance tests werealways two-tailed. Code and data to recreate the admissionnon-infectious SIRS versus sepsis comparisons for the examined gene setsare available at http://khatrilab.stanford.edu/sepsis. The uploaded dataare in the renormalized form used here. Glue Grant data is available toresearchers who have been approved by the Glue Grant consortium;instructions are on our website. If the results are used in amanuscript, the authors request citations of both this article, and ofthe articles that described the original datasets.

Results

We performed a systematic search of public gene expression databases(FIG. 17 ), and also we used the two independent Glue Grant traumacohorts, broken up into time-matched bins of never-infected patients andpatients within +/- 24 hours of diagnosis of sepsis, as previouslydescribed (Sweeney et al. (2015) Sci Transl Med 7(287):287ra271). Thisyielded a total of 39 datasets that matched criteria, composed of 3241patient samples (Scicluna et al., supra; McHugh et al., supra; Dolinayet al. (2012) Am J Respir Crit Care Med 185(11):1225-1234; Parnell etal. (2012) Crit Care 16(4):R157; Wong et al. (2007) Physiol Genomics30(2):146-155; Wynn et al. (2011) Mol Med 17(11-12):1146-1156; Wong etal. (2009) Crit Care Med 37(5):1558-1566; Shanley et al. (2007) Mol Med13(9-10):495-508; Cvijanovich et al. (2008) Physiol Genomics34(1):127-134; Almansa et al. (2012) BMC Res Notes 5:401; Irwin et al.(2012) BMC Med Genomics 5:13; van de Weg et al. (2015) PLoS Negl TropDis 9(3):e0003522; Emonts M. Polymorphisms in Immune Response Genes inInfectious Diseases and Autoimmune Diseases [Ph.D. thesis]: ErasmusUniversity Rotterdam; 2008; Pankla et al. (2009) Genome Biol10(11):R127; Zaas et al.( 2009) Cell Host Microbe 6(3):207-217; Parnellet al. (2011) PLoS One 6(3):e17186; Bermejo-Martin et al. (2010) CritCare 14(5):R167; Berry et al. (2010) Nature 466(7309):973-977; Smith etal. (2014) Nat Commun 5:4649; Berdal et al. (2011) J Infect63(4):308-316; Ahn et al. (2013) PLoS One 8(1):e48979; Mejias et al.(2013) PLoS Med 10(11):e1001549; Hu et al. (2013) Proc Natl Acad Sci USA110(31):12792-12797; Herberg et al. (2013) J Infect Dis208(10):1664-1668; Kwissa et al. (2014) Cell Host Microbe 16(1):115-127;Cazalis et al. (2014) Intensive Care Med Exp 2(1):20; Suarez et al.(2015) J Infect Dis. 212(2):213-222; Zhai et al. (2015) PLoS Pathog11(6):e1004869; Conejero et al. (2015) J Immunol 195(7):3248-3261; Xiaoet al. (2011) J Exp Med 208(13):2581-2590; and Warren et al. (2009) MolMed 15(7-8):220-227).

Our literature review revealed three gene expression classifiersspecifically optimized to distinguish non-infectious SIRS from sepsis inwhole blood samples. These were: the 11-gene set (SMS) that we publishedpreviously (Sweeney et al., Sci Transl Med 2015); the FAIM3:PLAC8 ratio(Scicluna et al., AJRCCM 2015), and the Septicyte Lab (McHugh et al.,PLoS Medicine, 2015). For each sample, the 11-gene score is calculatedaccording to the following formula:

$\begin{array}{l}\sqrt[6]{\left( {CEACAM1 \ast ZDHHC19 \ast C9orf95 \ast GNA15 \ast BATF \ast C3AR1} \right)} \\{- \frac{5}{6}\sqrt[5]{\left( {KIAA1370 \ast TGFBI \ast MTCH1 \ast RPGRIP1 \ast HLA - DPB1} \right)}}\end{array}$

The FAIM3:PLAC8 ratio is calculated as: PLAC8/FAIM3. The Septicyte Labis calculated as: (PLAC8 + LAMP1) - (PLA2G7 + CEACAM4). In all cases,the calculations are performed on log2-transformed data.

The robustness and reproducibility of each of the three sepsis scoresdepends on robust and reproducible change in expression for each oftheir constituent genes. Therefore, we explored how consistentlyindividual genes in each of the three tests changed across 12whole-blood cohorts comparing non-infected SIRS/trauma patients tosepsis patients. Our meta-analysis of these datasets revealed that eachof the 16 genes included in any of the 3 gene scores (except CEACAM4)changed in the desired direction (FDR < 5%). Notably, CEACAM4, one ofthe genes in the Septicyte Lab, was significantly down-regulated only inits corresponding discovery cohort.

Next, we divided the datasets into two broad types of comparison:patients with non-infectious SIRS or trauma versus sepsis or acuteinfections (Table 1); and healthy controls vs. patients with sepsis oracute infection (Table 2). For both of these types of comparison, wecalculated both the overall mean AUC, as well as the AUC when includingonly independent validation datasets; for each of the three signatures,we excluded their corresponding discovery datasets.

In the non-infectious SIRS/trauma versus sepsis datasets (16 cohorts,1148 samples, Table 1), there were no significant differences in pairedt-tests between the AUCs of the three gene expression diagnostic scorescomparing overlapping validation datasets (all p>0.1; FIGS. 18 and 19 ).When comparing the AUCs from all 16 cohorts (i.e. including thediscovery cohorts), the Sepsis MetaScore AUCs were significantly higherthan those of the other two gene scores (both p<0.05), with nosignificant difference between the FAIM3:PLAC8 ratio and the SepticyteLab. However, these results do not necessarily point to better overallperformance of the Sepsis MetaScore, as the Sepsis MetaScore used 9 ofthese cohorts in discovery. The FAIM3:PLAC8 ratio showed decreasedperformance in GSE32707 and GSE40012, but as discussed previously, it isspecifically designed for testing the presence of CAP and may not begeneralizable to other forms of non-infectious inflammation (41, 42).Finally, the Septicyte Lab had significantly reduced performance(AUC<0.5) in separating both pediatric SIRS/sepsis patients, as well ashospitalized COPD patients with and without infections. It possible thatthis reduction in AUC for the Septicyte Lab is due to the differences inclinical circumstances or microarray types compared to the initialdiscovery cohort for the Septicyte Lab.

We next examined datasets that compared healthy controls to patientswith sepsis or acute infections (26 datasets, 2417 samples, Table 2).Most, but not all, of these patients had sepsis; however, for a sepsisdiagnostic test, it is reasonable to expect that it should be able todistinguish most infections from healthy controls. Here, both the SepsisMetaScore and the FAIM3:PLAC8 ratio performed as expected, with meanvalidation AUCs of 0.96 +/- 0.05 and 0.94 +/- 0.09 (Table 2). However,the Septicyte Lab had an AUC < 0.7 in 12 datasets (43% of totaldatasets) composed of 1562 samples (64% of total samples), resulting ina mean validation AUC = 0.71 +/- 0.20 (Table 2), significantly lowerthan both the Sepsis MetaScore and FAIM3:PLAC8 ratio (both P<le-5).Again, this reduced performance may be due to the Septicyte Lab’sinclusion of CEACAM4, which was found to be non-significant inmeta-analysis. One could argue that there is no clinical need for adiagnostic to separate healthy controls from those with sepsis; however,poor performance in distinguishing these two groups may be indicative ofdeeper biases and may increase the risk of non-generalizability.

An ideal sepsis diagnostic would not show varying performance dependingon the type of infection present. In order to study whether any of thediagnostics is biased towards a specific type of infection, we searchedthrough all of the included datasets to find those comparing patientswith bacterial and viral infections, and those comparing Gram positiveand Gram negative infections. We then compared both the diagnostic powerin detecting these types of infections (as compared to healthycontrols), as well as comparing the distributions of scores. Since ahigher score indicates a higher likelihood of infection, a score that isconsistently lower in one infection type may indicate decreaseddiagnostic performance for that type, even if a change in diagnosticperformance is not detected as compared to healthy controls.

There were 8 datasets that provided information about whether a patienthad bacterial or viral infection. In general, there were few differencesbetween the AUCs for bacterial and viral infections for any of the threescores (Table 3); however, this may be due to small numbers, and therelatively high AUCs in comparing these infections to healthy controls.However, despite these caveats, both the Sepsis MetaScore and theFAIM3:PLAC8 ratio showed higher mean scores in patients with bacterialinfections as compared to viral infections in ⅞ datasets tested, bothreaching significance in 2/8 datasets (Table 5). The Septicyte Lab, incontrast, did not show a strong trend in comparing bacterial and viralinfections, showing a significantly higher mean in viral than inbacterial infections in one dataset.

There were 8 datasets that provided information about whether a patienthad Gram positive and Gram negative infection. The comparison of Grampositive and Gram negative infections revealed no differences in AUC foreither the Sepsis MetaScore or the FAIM3:PLAC8 ratio; the Septicyte Labshowed some variability, but this may be due to a high variability indiagnostic performance vs. healthy controls rather than differencesbetween Gram positive and Gram negative infections (Table 4). The SepsisMetaScore, FAIM3:PLAC8 ratio, and Septicyte lab showed 2, 1, and 1datasets, respectively, for which the score was significantly higher inGram negative than in Gram positive patients (Table 6).

Discussion

Here we compared three sepsis gene expression diagnostics (the SepsisMetaScore, the FAIM3:PLAC8 ratio, and the Septicyte Lab) in allavailable time-matched, whole blood clinical sepsis datasets. There wereno significant differences among the distribution of AUCs comparing allvalidation non-infectious SIRS/trauma and sepsis datasets. However,there were several individual datasets for which the FAIM3:PLAC8 ratioand the Septicyte Lab showed AUCs < 0.7. Notably, the Septicyte Lab alsohad significantly reduced performance in validation cohorts whencomparing healthy controls to patients with sepsis or acute infections,with the Septicyte lab showing AUCs < 0.7 in 43% of all datasets. TheSepticyte Lab was initially validated in a large, independent cohort ofpatients from the MARS consortium using targeted qPCR, and showed anoverall validation AUC of 0.88 (McHugh et al. (2015) PLoS Med12(12):e1001916); thus, the reduced performance in our analysis may beindicative of either differences in clinical conditions, difference intechnology, or both. Forest plots indicate that the effect size ofCEACAM4 in GSE74224 (discovery cohort for the Septicyte Lab) may be anoutlier, which may be contributing to the relatively worsegeneralizability.

There is some evidence of a trend towards higher scores in bacterialinfection as opposed to viral infection for the Sepsis MetaScore and theFAIM3:PLAC8 ratio, while the Septicyte Lab showed some evidence of ahigher score in viral infections. There was a statisticallynon-significant trend towards higher scores in Gram negative as opposedto Gram positive infections for all three scores. However, the differingpathogen types were not matched for illness severity, age, gender, orother clinical confounders in their individual datasets; hence, thesetrends must be interpreted with caution. For instance, if bacterialinfections were generally more severe than viral infections, and Gramnegative generally more severe than Gram positive, then these scoresmight point to a confounding by severity. Further testing againstconfounders will thus be necessary across all cohorts for all testscompared here.

Previously the Sepsis MetaScore was validated in the Glue Grantneutrophils cohort and in a subset of the healthy vs. infections cohorts(Sweeney et al. (2015) Sci Transl Med 7(287):287ra271). In theadditional cohorts tested here, the Sepsis MetaScore continues to showresults similar to prior validation. The FAIM3:PLAC8 ratio was validatedin some of these data in a follow-up publication (Sweeney et al. (2015)Am J Respir Crit Care Med 192(10): 1260-1261), though the authorspointed out that their gene set was initially designed for a very narrowquestion of determining the presence of CAP in patients admitted to theICU suspected of having CAP (Scicluna et al. (2015) Am J Respir CritCare Med 192(10):1261-1262). The Septicyte Lab was tested in cohortE-TABM-1548 (McHugh et al., supra), but we have previously shown thatbecause of expected changes in the baseline gene expression profile dueto recovery from surgery, it is not appropriate to use for testingsepsis diagnostics (Sweeney et al. (2015), supra).

The fact that the public data are not used for validation of newdiagnostics may reflect the difficulty and knowledge curve that someresearchers face in accessing and using these data. Given the difficultyof wrangling the public data into an easily usable form, we haveprovided a hand-curated, unified repository of these data, along with anR script to easily apply a classifier of interest to the datasets(khatrilab.stanford.edu/sepsis). We recommend a practice that any newgene expression classifiers for sepsis should be tested in these data toallow for easy benchmarking and comparison between classifiers. Werecognize that the simple measure of the AUC does not account for allpotential measures of clinical utility. In addition, a score thatrepeatedly performs well in a single clinical area can still have greatclinical utility even if it fails in a different clinical area, andwould need to be applied with care. Nevertheless, it is important toelucidate the strengths and weaknesses of any new sepsis diagnostic inorder to help focus the resources on further clinical trials in areasthat show the most promise.

Each of the diagnostic gene sets tested here has both strengths andweaknesses. In general, for any sepsis diagnostic to become usefulclinically, it must retain good diagnostic power in a broad range ofpatient settings in its final form. The microarray cohorts used hereallow for head-to-head comparisons of the different gene expressiondiagnostics, but may show underestimates of the diagnostic performanceof any test when using a targeted assay (i.e., without the technicalvariation across multiple microarray platforms). Thus, furtherprospective validation of any gene set will be needed prior to theirrollout into clinical practice. Still, it seems likely that given theincreasing accuracy of the technique, molecular profiling of the hostresponse will become a valuable part of the clinical toolset indiagnosing, treating, and potentially preventing sepsis.

While the preferred embodiments of the invention have been illustratedand described, it will be appreciated that various changes can be madetherein without departing from the spirit and scope of the invention.

TABLE 1 Non-infectious SIRS/trauma vs. sepsis/infections datasets. (D):Discovery dataset for the given score. *GSE28750 is a subset ofGSE74224, so was counted as discovery for Septicyte Lab. ** GSE28750 isa subset of GSE74224 and is treated as described in Methods AccessionMicroarray Type Clinical comparison N Non-infected SIRS N Sepsis AUCSepsis MetaScore AUC FAIM3: PLAC8 Ratio AUC Septicyte Lab GSE28750GPL570 post-op vs. sepsis 11 10 0.96 (0.92-1) (D) 0.87 (0.79-0.95) 0.85(0.77-0.94)* GSE32707 GPL10558 MICU +/-SIRS vs. sepsis 55 48 0.8(0.75-0.84) (D) 0.66 (0.6-0.71) 0.66 (0.61-0.71) GSE40012 GPL6947 ICUSIRS vs. CAP 24 52 0.71 (0.65-0.77) (D) 0.58 (0.51-0.65) 0.8 (0.75-0.85)GSE65682 GPL13667 ICU non-infected vs. CAP 33 101 0.78 (0.74-0.82) 0.84(0.8-0.87) (D) 0.74 (0.7-0.79) GSE66099 GPL570 PICU SIRS vs. sepsis 30199 0.79 (0.76-0.83) (D) 0.74 (0.7-0.78) 0.44 (0.38-0.5) GSE74224GPL5175 post-op vs. sepsis 31 74 0.90 (0.87-0.92)** 0.92 (0.9-0.95) 0.99(0.99-1) (D) E-MEXP-3589 GPL10332 Hosp. COPD +/infection 14 14 0.74(0.65-0.83) 0.49 (0.38-0.6) 0.46 (0.36-0.57) Buffy Coat, Day [1,3)GPL570 never-infected trauma vs. trauma with sepsis 65 9 0.91(0.84-0.97) (D) 0.62 (0.51-0.72) 0.83 (0.75-0.92) Buffy Coat, Day [3,6)GPL570 never-infected trauma vs. trauma with sepsis 63 17 0.89(0.84-0.94) (D) 0.84 (0.78-0.9) 0.73 (0.65-0.8) Buffy Coat, Day [6,10)GPL570 never-infected trauma vs. trauma with sepsis 50 15 0.91(0.86-0.96) (D) 0.83 (0.77-0.9) 0.72 (0.64-0.79) Buffy Coat, Day [10,18)GPL570 never-infected trauma vs. trauma with sepsis 22 4 0.84(0.72-0.97) (D) 0.78 (0.65-0.92) 0.8 (0.66-0.93) Buffy Coat, Day [18,24)GPL570 never-infected trauma vs. trauma with sepsis 6 4 0.96 (0.88-1)(D) 0.96 (0.88-1) 0.83 (0.69-0.97) Neutrophils, Day [1,3) GGH-1, GGH-2never-infected trauma vs. trauma with sepsis 56 10 0.72 (0.63-0.82) 0.74(0.65-0.83) 0.68 (0.58-0.77) Neutrophils, Day [3,6) GGH-1, GGH-2never-infected trauma vs. trauma with sepsis 55 10 0.83 (0.75-0.91) 0.87(0.79-0.94) 0.9 (0.84-0.97) Neutrophils, Day [6,10) GGH-1, GGH-2never-infected trauma vs. trauma with sepsis 46 14 0.88 (0.82-0.94) 0.88(0.82-0.94) 0.84 (0.77-0.91) Neutrophils, Day [10,18] GGH-1, GGH-2never-infected trauma vs. trauma with sepsis 24 3 0.89 (0.77-1) 0.9(0.79-1) 0.78 (0.62-0.94) Overall mean: 0.844 +/-0.080 0.782 +/-0.1350.754 +/-0.145 mean in validation only: 0.817 +/-0.069 0.779 +/-0.1390.729 +/-0.135

TABLE 2 Healthy vs. sepsis/acute infections. (D): Discovery dataset forthe given score Accession Microarray Type Clinical cohort N Healthy NInfected AUC Sepsis MetaScore AUC FAIM3:PLAC8 Ratio AUC Septicyte LabE-MEXP-3567 GPL96 children with meningococcal sepsis 3 12 0.97 (0.93-1)1 (1-1) 0.94 (0.89-1) E-MEXP-3589 GPL10332 hospitalized COPD + infection4 14 0.98 (0.95-1) 0.95 (0.9-1) 0.32 (0.16-0.48) E-MTAB-3162 GPL570Dengue, DHF 15 30 1 (1-1) 1 (1-1) 0.8 (0.73-0.86) GSE11755 GPL570children w/ meningococcal sepsis 3 6 1 (1-1) 1 (1-1) 0.78 (0.62-0.93)GSE13015 GPL6106 sepsis, w/wo burkholderia 10 48 1 (0.99-1) 0.98(0.97-1) 0.94 (0.9-0.97) GSE13015 GPL6947 sepsis, w/wo burkholderia 1015 1 (1-1) 1 (1-1) 0.85 (0.78-0.93) GSE17156 GPL571 Viral Challenge Peaksymptoms 56 27 0.91 (0.87-0.94) 0.89 (0.85-0.93) 0.51 (0.45-0.58)GSE20346 GPL6947 bacterial or influenza pneumonia 36 20 1 (1-1) 1 (1-1)0.95 (0.92-0.99) GSE21802 GPL6102 Severe influenza 4 12 0.98 (0.95-1) 1(1-1) 0.69 (0.55-0.83) GSE22098 GPL6947 children with Staph and Strepinfections 81 52 0.85 (0.81-0.88) 0.65 (0.6-0.7) 0.79 (0.75-0.83)GSE25504 GPL13667 neonatal sepsis 6 14 0.92 (0.86-0.98) 0.83 (0.74-0.92)0.42 (0.28-0.56) GSE25504 GPL570 neonatal sepsis 3 2 1 (1-1) 1 (1-1)0.83 (0.62-1) GSE25504 GPL6947 neonatal sepsis 35 28 0.94 (0.91-0.97)0.88 (0.83-0.92) 0.24 (0.18-0.3) GSE27131 GPL6244 Severe Flu A 7 7 1(1-1) 1 (1-1) 1 (1-1) GSE28750 GPL570 sepsis 20 10 1 (1-1) (D) 1 (1-1)0.74 (0.64-0.84) (D) GSE33341 GPL571 BSI S aureus or E coli 43 51 1(1-1) 0.99 (0.98-1) 0.69 (0.64-0.74) GSE38900 GPL10558 Viral infection 828 0.89 (0.84-0.95) 0.7 (0.61-0.79) 0.64 (0.54-0.74) GSE38900 GPL6884Viral infection 31 153 0.91 (0.89-0.93) 0.91 (0.89-0.93) 0.41(0.35-0.46) GSE40012 GPL6947 ICU - CAP 18 52 1 (1-1) (D) 1 (0.99-1) 0.89(0.85-0.93) GSE40396 GPL10558 children viral/bacterial infection + fever22 30 0.97 (0.94-0.99) 0.95 (0.93-0.98) 0.77 (0.71-0.83) GSE42026GPL6947 Bacterial & viral infection 33 59 0.97 (0.95-0.98) 0.98(0.97-0.99) 0.74 (0.7-0.79) GSE51808 GPL13158 Dengue, DHF 9 28 0.98(0.95-1) 1 (1-1) 1 (0.99-1) GSE57065 GPL570 Septic Shock 25 82 1 (1-1)0.99 (0.99-1) 0.81 (0.77-0.85) GSE60244 GPL10558 Bacterial, Viral, orBoth 40 118 0.96 (0.95-0.97) 0.84 (0.81-0.87) 0.64 (0.59-0.68) GSE65682GPL13667 ICU - CAP 42 101 1 (0.99-1) 0.93 (0.92-0.95) (D) 0.62(0.58-0.66) GSE66099 GPL570 Pediatric sepsis 47 199 1 (1-1) (D) 0.99(0.99-1) 0.54 (0.49-0.59) GSE68310 GPL10558 Viral Infection (initialsymptoms) 243 258 0.87 (0.85-0.88) 0.92 (0.91-0.93) 0.66 (0.64-0.69)GSE69528 GPL10558 Bacterial infections 55 83 0.99 (0.99-1) 0.97(0.96-0.98) 0.72 (0.68-0.76) Mean Validation AUC: 0.963 +/-0.046 0.940+/-0.092 0.711 +/-0.203

TABLE 3 Comparison of AUCs for bacterial and viral infections vs.healthy controls. (D): Discovery dataset for the given score Accession NHealth y N Bacteria l N Vira l Bacterial AUC Sepsis MetaScor e Viral AUCSepsis MetaScor e Bacterial AUC FAIM3:PLAC 8 ratio Viral AUC FAIM3:PLAC8 ratio Bacterial AUC Septicyt e Lab Viral AUC Septicyt e LabE-MEXP-3589 4 4 5 0.94 (0.84-1) 1 (1-1) 1 (1-1) 1 (1-1) 0.69 (0.5-0.88)0.15 (0.012-0.29) GSE2034 6 GPL6947 36 12 8 1 (1-1) 1 (1-1) 1 (1-1) 1(1-1) 0.92 (0.87-0.98) 1 (1-1) GSE2550 4 GPL1366 7 6 11 3 0.98 (0.96-1)0.67 (0.47-0.87) 0.91 (0.84-0.98) 0.56 (0.35-0.76) 0.35 (0.21-0.49) 0.67(0.47-0.87) GSE4001 2 GPL6947 18 36 11 1 (1-1) (D) 1 (1-1) (D) 1(0.99-1) 1 (1-1) 0.85 (0.8-0.9) 0.97 (0.94-1) GSE4039 6 GPL1055 8 22 822 0.97 (0.92-1) 0.96 (0.94-0.99) 0.93 (0.87-0.99) 0.96 (0.93-0.99) 0.74(0.63-0.85) 0.79 (0.72-0.85) GSE4202 6 GPL6947 33 18 41 0.97 (0.95-1)0.96 (0.94-0.98) 1 (1-1) 0.97 (0.95-0.99) 0.75 (0.68-0.82) 0.74(0.69-0.8) GSE6024 4 GPL1055 8 40 22 71 0.94 (0.9-0.97) 0.97 (0.96-0.98)0.8 (0.74-0.86) 0.85 (0.82-0.89) 0.58 (0.5-0.65) 0.66 (0.61-0.71)GSE6609 9 GPL570 47 109 11 1 (1-1) (D) 1 (0.98-1) (D) 0.94 (0.92-0.95)0.96 (0.91-1) 0.63 (0.58-0.67) 0.56 (0.47-0.66)

TABLE 4 Comparison of AUCs for Gram negative and Gram positiveinfections vs. healthy controls. (D): Discovery dataset for the givenscore Accession N Healthy N Gram negative N Gram positive Gram negativeAUC Sepsis MetaScore Gram positive AUC Sepsis MetaScore Gram negativeAUC FAIM3: PLAC8 ratio Gram positive AUC FAIM3: PLAC8 ratio Gramnegative AUC Septicyte Lab Gram positive AUC Septicyte Lab GSE13015GPL6106 10 32 13 1 (1-1) 0.98 (0.96-1) 1 (1-1) 0.94 (0.89-0.99) 0.94(0.91-0.97) 0.91 (0.85-0.97) GSE13015 GPL6947 10 11 4 1 (1-1) 1 (1-1) 1(1-1) 1 (1-1) 0.91 (0.84-0.97) 0.7 (0.54-0.86) GSE25504 GPL13667 6 4 6 1(1-1) 0.97 (0.92-1) 1 (1-1) 0.83 (0.71-0.95) 0.42 (0.23-0.6) 0.33(0.18-0.49) GSE25504 GPL6947 35 6 19 1 (0.98-1) 0.97 (0.95-1) 0.9(0.82-0.98) 0.91 (0.86-0.95) 0.31 (0.21-0.41) 0.17 (0.12-0.23) GSE33341GPL571 43 19 32 1 (1-1) 1 (1-1) 0.98 (0.96-1) 1 (0.99-1) 0.66(0.59-0.74) 0.7 (0.65-0.76) GSE40396 GPL10558 22 4 4 0.98 (0.93-1) 0.95(0.88-1) 0.94 (0.86-1) 0.92 (0.83-1) 0.89 (0.78-0.99) 0.59 (0.43-0.75)GSE66099 GPL570 47 44 65 1 (1-1) (D) 1 (1-1) (D) 0.93 (0.9-0.96) 0.94(0.92-0.96) 0.6 (0.54-0.66) 0.65 (0.6-0.7) GSE69528 GPL10558 55 57 24 1(0.99-1) 0.98 (0.96-1) 1 (0.99-1) 0.94 (0.91-0.97) 0.76 (0.72-0.8) 0.66(0.59-0.72)

TABLE 5 Comparison of score distributions for bacterial and viralinfections EMEXP-3589 GSE20346 GPL6947 GSE25504 GPL13667 GSE40012GPL6947 GSE40396 GPL10558 GSE42026 GPL6947 GSE60244 GPL10558 GSE66099GPL570 N Bacterial 4 12 11 36 8 18 22 109 N Viral 5 8 3 11 22 41 71 11Sepsis MetaScore mean bacterial 0.372 -0.0292 0.37 0.0348 0.182 0.5310.188 0.0595 Sepsis MetaScore mean viral -0.298 0.0438 -1.36 -0.114-0.0661 -0.233 -0.0583 -0.59 Sepsis MetaScore Wilcoxon statistic 14 4332 192 95 536 901 792 Sepsis MetaScore Wilcox P-value 0.413 0.734 0.0110.892 0.765 0.00537 0.28 0.0808 FAIM3:PLAC8 ratio mean bacterial 1.251.48 1.4 1.36 1.17 1.17 1.25 1.74 FAIM3:PLAC8 ratio mean viral 0.9981.18 1.19 1.19 1.26 1.09 1.23 1.55 FAIM3:PLAC8 ratio Wilcoxon statistic15 84 28 225 54 506 841 726 FAIM3:PLAC8 ratio Wilcox P-value 0.2860.0041 0.0879 0.511 0.118 0.0237 0.591 0.252 Septicyte Lab meanbacterial 7.66 11.4 7.62 11.3 10.1 11.9 8.7 10.8 Septicyte Lab meanviral 5.54 12.6 9.35 12.1 10.3 11.7 9.05 10.6 Septicyte Lab Wilcoxonstatistic 17 25 2 127 69 412 695 679 Septicyte Lab Wilcox P-value 0.1110.0825 0.022 0.0763 0.393 0.488 0.44 0.472

TABLE 6 Comparison of score distributions for Gram negative and Grampositive infections GSE13015 GPL6106 GSE13015 GPL6947 GSE25504 GPL13667GSE25504 GPL6947 GSE33341 GPL571 GSE40396 GPL10558 GSE66099 GPL570GSE69528 GPL10558 N Gram negative 32 11 4 6 19 4 44 57 N Gram positive13 4 6 19 32 4 65 24 Sepsis MetaScore mean Gram negative 0.245 0.4160.471 0.145 0.157 0.389 -0.258 0.123 Sepsis MetaScore mean Gram positive-0.602 -1.14 -0.314 -0.0457 -0.0935 -0.389 0.175 -0.291 Sepsis MetaScoreWilcoxon statistic 301 42 16 52 341 9 1050 852 Sepsis MetaScore WilcoxonP-value value 0.0192 0.00586 0.476 0.78 0.481 0.886 0.0197 0.0832FAIM3:PLAC8 ratio mean Gram negative 1.51 1.57 1.42 1.28 1.46 1.22 1.721.35 FAIM3:PLAC8 ratio mean Gram positive 1.3 1.29 1.33 1.29 1.54 1.111.76 1.27 FAIM3:PLAC8 ratio Wilcoxon statistic 310 33 11 52 219 11 1200862 FAIM3:PLAC8 ratio Wilcoxon P-value value 0.00975 0.177 0.914 0.780.1 0.486 0.166 0.0664 Septicyte Lab mean Gram negative 11.8 12.7 7.8910 9.47 10.4 10.7 10.2 Septicyte Lab mean Gram positive 10.7 9.95 7.668.95 9.48 9.86 10.9 9.6 Septicyte Lab Wilcoxon statistic 282 40 15 80280 13 1260 795 Septicyte Lab Wilcoxon P-value value 0.0652 0.0176 0.610.156 0.65 0.2 0.307 0.253

1-26. (canceled)
 27. A method for treating sepsis, comprising: (a)measuring the expression levels of two or more biomarkers in a sampleobtained from a subject; (b) calculating a composite biomarker valueusing the expression levels; (c) determining that the compositebiomarker value exceeds a threshold value; and (d) administering anantibiotic to the subject.
 28. The method of claim 27, wherein the twoor more biomarkers comprise at least three biomarkers.
 29. The method ofclaim 27, wherein the two or more biomarkers are selected from ADAMTS3,ANKRD22, ANXA3, AP3B2, ARL8A, B3GNT8, BATF, BPI, BST1, C1orf162, C3AR1,C9orf103, C9orf95, CCR1, CD177, CD63, CD82, CEACAM1, CLEC5A, DHRS9,EMR1, FAM89A, FCER1G, FCGR1B, FES, FFAR3, FIG4, GNA15, GPR84, HK3, HP,IL10, IL18R1, KCNE1, LCN2, LIN7A, OSCAR, OSTalpha, P2RX1, PADI2, PECR,PLAC8, PLB1, PNPLA1, PPM1M, PSTPIP2, RETN, RGL4, S100A12, SEPHS2, SETD8,SGSH, SIGLEC9, SLC26A8, SPPL2A, SQRDL, TCN1, ZDHHC19, ZDHHC3, ARHGEF18,CACNA2D3, CNNM3, GLO1, GRAMD1C, HACL1, HLA-DPB1, KIAA1370, KLHDC2,METAP1, MRPS35, MTCH1, NOC3L, ODC1, PRKRIR, RPGRIP1, RPUSD4, SETD1B,TBC1D4, TGFBI, TOMM20, UBE2Q2, and WDR75.
 30. The method of claim 27,wherein the two or more biomarkers are selected from CEACAM1, ZDHHC19,C9orf95, GNA15, BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, andHLA-DPB1.
 31. The method of claim 27, wherein the measuring comprisesmeasuring the mRNA expression levels of the two or more biomarkersselected from CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, C3AR1, KIAA1370,TGFBI, MTCH1, RPGRIP1, and HLA-DPB1.
 32. The method of claim 27, whereinthe measuring comprises measuring the protein expression levels of thetwo or more biomarkers selected from CEACAM1, ZDHHC19, C9orf95, GNA15,BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, and HLA-DPB1.
 33. Themethod of claim 27, wherein the expression levels of the two or morebiomarkers are increased compared to reference value ranges for the twoor more biomarkers, wherein the reference value ranges are derived fromone or more patients that do not have inflammation caused by infection.34. The method of claim 27, wherein the expression levels of the two ormore biomarkers are decreased compared to reference value ranges for thetwo or more biomarkers, wherein the reference value ranges are derivedfrom one or more patients that do not have inflammation caused byinfection.
 35. The method of claim 27, wherein the composite biomarkeris determined by a computer comprising a processing unit using a formulabased on the expression levels of the two or more biomarkers.
 36. Themethod of claim 27, wherein the composite biomarker has been validatedin multiple cohorts, wherein the multiple cohorts comprise at least onediscovery cohort from which the composite biomarker value was derivedand at least one validation cohort fully independent from the at leastone discovery cohort.
 37. The method of claim 36, wherein the at leastone discovery cohort and the at least one validation cohort comparenon-healthy, non-infected subjects to time-matched infected subjects.38. The method of claim 36, wherein the at least one validation cohortcomprises at least three independent cohorts and has at least 200samples in total.
 39. The method of claim 36, wherein the at least onevalidation cohort comprises at least five independent cohorts and has atleast 500 samples in total.
 40. The method of claim 36, wherein the atleast one validation cohort comprises time-matched samples of infectedand non-infected patients.
 41. The method of claim 40, wherein theinfected patients are sepsis patients.
 42. The method of claim 41,wherein the sepsis patients comprise patients within 48 hours ofadmission to a hospital for sepsis or within 24 hours from diagnosis ofsepsis.
 43. The method of claim 40, wherein the non-infected patientshave systemic inflammatory response syndrome (SIRS), an autoimmunedisorder, a traumatic injury, or have undergone surgery.
 44. The methodof claim 27, wherein the two or more biomarkers are differentiallyexpressed between inflammation caused by infection and inflammation notcaused by infection.
 45. The method of claim 36, wherein an area underthe receiver operating characteristic (ROC) curve for the compositebiomarker is at least 0.75 in the at least one validation cohort. 46.The method of claim 36, wherein the average area under the receiveroperating characteristic (ROC) curve for the composite biomarker acrossall available validation cohorts is at least 0.75.
 47. The method ofclaim 36, wherein an area under the receiver operating characteristic(ROC) curve for the composite biomarker is at least 0.85 in the at leastone validation cohort.
 48. The method of claim 36, wherein the averagearea under the receiver operating characteristic (ROC) curve for thecomposite biomarker across all available validation cohorts is at least0.83.
 49. The method of claim 27, further comprising obtaining abiological sample from the subject.
 50. The method of claim 47, thebiological sample comprises a whole blood, a buffy coat, a plasma, aserum, a urine, a saliva, a tissue biopsy, a peripheral bloodmononucleated cell (PBMC), a band cell, a neutrophil, a monocyte, a Tcell, or a combination thereof.
 51. The method of claim 27, wherein themeasuring comprises microarray analysis, polymerase chain reaction(PCR), quantitative PCR (qPCR), reverse transcriptase polymerase chainreaction (RT_PCR), isothermal amplification, Northern blot, or serialanalysis of gene expression (SAGE).
 52. The method of claim 27, furthercomprising admitting the subject to a hospital.